Image Analysis Showdown: BDM Compares ChatGPT, Gemini, Claude, and More!

October 22, 2025

BDM recently conducted an evaluation of the image analysis capabilities of major AI tools. Which ones demonstrated superior performance? Here’s a detailed look.

Contents

Just two years ago, generative AI tools were distinguished by unique features. Now, their capabilities appear more uniform. But do they truly deliver the same level of performance? To find out, BDM assessed the features of the main tools on the market.

Image analysis was one of the first multimodal capabilities integrated into mainstream generalist AIs such as ChatGPT, Gemini, and others. This feature allows users to upload an image, graphic, or photo and request a detailed description, thorough analysis, or practical advice from the model.

For this test, BDM submitted three images, each accompanied by a prompt, to various AI models:

  • A screenshot of the Discord interface with the prompt: “How do I access the microphone settings from this interface?”
  • A graph from an Ahrefs study with the prompt: “Describe, analyze, and explain this graph in detail.”
  • A photo of a Pentax ME Super with the prompt: “What model is this and how do I open the film compartment?”

The criteria evaluated included the accuracy of the image analysis, based on the correctness and depth of the description or interpretation; the relevance of the response, considering the AI’s ability to adhere to a specific prompt; and the clarity or richness of the response relative to the provided image and query.

Image Analysis in ChatGPT: Our Review

Regardless of the request made (help with the interface, graph analysis, or camera advice), ChatGPT (specifically GPT-5) produced responses very quickly. OpenAI’s tool also pays particular attention to the prompts: it effectively describes the graph, provides access to microphone settings in the Discord interface, identifies the camera model, and explains how to open the film compartment.

In terms of interpretation, there are few errors. ChatGPT accurately recognizes it as the Discord Mac app interface and correctly guides users to the microphone settings, using the correct names for menus and submenus. It accurately reads the figures from the Ahrefs graph and provides relevant analysis. The chatbot also perfectly explains how to open the film compartment, although it misidentifies the camera model.

Finally, regarding the formulation of the response, ChatGPT is direct and to the point. It favors bullet points and a subheading structure, is concise without being superficial, and provides clear advice and guidance.

Image Analysis in Gemini: Our Review

Gemini, here in its version 2.5 Flash, takes a brief moment to think before generating its responses, but this delay never exceeds a few seconds. Google’s model understands the context well, whether it’s the Discord interface, the graph, or the Pentax photo. It responds directly to the associated requests, though not consistently in the same manner.

For instance, with Discord, it is slightly less precise than ChatGPT in its instructions, but still clear enough to be understood and executed. The same applies to the camera instructions. While it does not identify the Discord interface as the macOS version, it is less assertive about the camera model than ChatGPT, correctly listing the actual model among several hypotheses.

The real difference lies in the analysis of the graph, which is particularly clear and in-depth, not exceeding the request, with a well-structured response. Additionally, Gemini wisely includes some precautionary advice at the end of its response about the camera, enhancing it intelligently.

Image Analysis in Claude: Our Review

When analyzing images, Claude, using its Sonnet 4.5 model, gets straight to the point, perhaps a bit too much. Indeed, while the accuracy of its analysis is generally good, with few errors or inaccuracies, the tool from Anthropic tends to be somewhat superficial. There are no significant issues in analyzing the Discord interface. For the camera, Claude misidentifies the model and does so with unshakeable confidence, even though the mistake is minor.

It also accurately understands the Ahrefs graph, can describe it clearly and structuredly, but shows a clear lack of deeper analysis. A notable lack of contextualization and explanation is limited to mere observation. However, it correctly guides users on Discord and even offers an interesting alternative solution, just as it adequately explains how to open the camera’s film compartment. A bit too concise, Claude would benefit from enriching its responses and being less certain of its conclusions.

Image Analysis in Perplexity: Our Review

Perplexity might not be the first choice for image analysis. Designed as an AI search engine, the tool can nonetheless perform the same tasks as its competitors. Just as well? It depends on the case. In our tests, Perplexity proved capable of following our instructions, responding to our specific requests. It had no trouble detecting the Discord interface and providing the correct path. It understands and describes the Ahrefs graph accurately, with a structure that perfectly matches the prompt: description, analysis, explanation.

The responses are concise yet fairly accurate. However, it’s regrettable that the analysis of the graph is somewhat biased, with a certain emphasis on… Perplexity. Finally, in terms of analysis and advice for the camera, the tool is completely wrong: it misidentifies the model and also misguides on how to open the film compartment.

Image Analysis in Copilot: Our Review

Microsoft’s AI tools often face criticism, yet this image analysis experience was quite a pleasant surprise. Copilot accurately detected the Discord interface, understood the Ahrefs graph, and was the only model tested to correctly identify the camera model. In terms of response relevance, there’s little to note regarding procedural/instructional aspects. One can easily find the microphone settings on Discord or the method to open the film compartment.

But there’s a catch. A big catch. In its analysis of the Ahrefs graph, although it offers a very clear response structure, Copilot makes a critical error: it misreads the numerical data. This mistake completely skews its response, undermining user confidence. This is unfortunate, as Copilot performed very well in the other two tests, but this error is perhaps among the worst possible… Analyzing graphs or data tables seems to be its Achilles’ heel.

Image Analysis in DeepSeek: Our Review

DeepSeek is not equipped for image analysis. In its basic consumer version, the Chinese AI notifies the user during file upload: “Extract only text from images and files.” Therefore, analyzing an interface or a photo is challenging. For instance, DeepSeek does not recognize Discord and is even unable to identify it despite specific features of the app (like Nitro). But can it even read? Apparently not. The camera, whose name is prominently displayed on the photo as “PENTAX,” is identified as “PENTA” by DeepSeek, which also misreads the figures from the Ahrefs study.

Its advice is consequently poor. It offers vague suggestions for accessing Discord’s settings—which it does not recognize—makes a poor analysis of the graph due to incorrect data, and provides very bad guidance for opening the film compartment of what it presents as a “disposable camera”… Best forgotten.

Image Analysis in Le Chat: Our Review

Le Chat, from the French leader Mistral, mostly operates with brevity. It has no trouble understanding the user’s request or identifying the type of visual presented. It recognized the Discord interface, understood the Ahrefs graph, and realized we were discussing a film camera Pentax. However, upon closer examination of its responses, some errors or inaccuracies become evident.

For example, it does not mention the correct names of the sub-menus in the Discord settings. While its explanation is adequate for basic navigation, it lacks precision. The same issue arises with the camera: Le Chat confidently asserts it is a Pentax Spotmatic, whereas it is actually the ME Super model. Worse, it misreads a number in the Ahrefs graph.

Thus, while at first glance, Le Chat seems to do the job, responding succinctly but relatively correctly to requests, these small errors and its lack of in-depth analysis make the functionality somewhat uncompetitive compared to the heavyweights of the market.

Similar Posts

Rate this post

Leave a Comment

Share to...