Misinformation Rampant: Are AI Systems Becoming Less Reliable?

September 25, 2025

newsguard-reponses-erronnes-chatbot-etude

The rate of misinformation spread by chatbots has nearly doubled over the past year, according to a recent study by NewsGuard. This rise is attributed to their inability to discern credible sources within the information ecosystem.

On a technical and functional level, generative AI systems like ChatGPT, Gemini, and Perplexity have become more powerful over the past year. However, their ability to differentiate between true facts and misinformation has deteriorated, despite “dozens of highly publicized updates to their models”.

This troubling observation was made by NewsGuard, which assessed the top ten market tools in August 2025. According to this company, which specializes in combating misinformation, “generative AI tools fail nearly twice as often as they did a year ago when it comes to distinguishing facts from false information”. But what explains this trend?

Claude and Gemini Outperform ChatGPT and Perplexity

To gauge the reliability of leading generative AI tools, from Claude to Grok and including Microsoft Copilot, NewsGuard has been conducting a monthly barometer since July 2024. The aim is to “assess how the major AI generative models handle blatantly false claims on controversial subjects or those likely to be targeted by malicious actors spreading misinformation”, explains the startup.

Specifically, these models are tested against misinformation related to current topics, such as peace negotiations between Ukraine and Russia or the upcoming parliamentary elections in Moldova. The goal is to evaluate whether they can identify these falsehoods and importantly, alert the user.

NewsGuard’s findings are alarming: during the audit conducted last August, generative AI tools relayed false claims in 35% of cases – almost twice the rate of 2024 (18%). However, not all tools performed equally: Claude (10%) and Gemini (16.67%) fared significantly better than Perplexity, which was caught out in 46.67% of cases. In August 2024, the AI-powered search engine was considered a model student, being the only one to refute all the false claims submitted by NewsGuard.

The reasons for Perplexity’s decline are unclear, but users have noticed it. A Reddit forum dedicated to Perplexity is filled with complaints about the chatbot’s declining reliability, and many users are questioning the loss of what was once its strength,” observes the startup.

Trapped in 40% of cases, ChatGPT and Meta are also among the poorer performers. Mistral AI, the French AI champion, repeats false information in 36.67% of cases, a score identical to that recorded in August 2024.

Models Struggle to Distinguish Between Reliable and Dubious Sources

If these tools are more frequently duped than a year ago, it’s because they now incorporate web-based research, analyzes NewsGuard. Previously, chatbots would avoid answering current event questions by informing the user that their knowledge base was outdated. With online research, the rate of non-response has dropped to 0% in August 2025, from 31% the previous year. “But this access to the Internet is not only negative. They also more often debunk narratives: the refutation rate has increased from 51% to 65%”, notes Chine Labbé, editor-in-chief of NewsGuard, in the Parisien.

The problem has thus shifted, warns NewsGuard: it now lies in the selection of sources. “Rather than not responding, the chatbots have begun extracting information from unreliable sources, failing to distinguish between media established for over a century and Russian propaganda campaigns using similar names”, the report states.

As a result, more and more malicious actors exploit this vulnerability by flooding the web with false content, particularly in content farms, the startup analyzes. “The models more frequently repeat false narratives, falling into data voids where only malicious actors provide information, and get trapped by foreign-created websites posing as local media, struggling to reliably handle breaking news”, NewsGuard worries.

In November 2024, Jensen Huang, CEO of Nvidia, stated: “We need to reach a stage where the response you get is largely trusted… I think we are several years from achieving this.” Almost a year later, the results show the opposite of progress, concludes NewsGuard.

Access the full study

Similar Posts

Rate this post

Leave a Comment

Share to...