Why ChatGPT Won’t Say “I Don’t Know”: Unveiling AI’s Hidden Rules!

September 18, 2025

Pourquoi ChatGPT refuse de dire « je ne sais pas »

The research team at OpenAI has delved into the phenomenon of hallucinations in ChatGPT, specifically focusing on its inability to acknowledge when it does not know an answer.

Artificial intelligence hallucinations – responses fabricated by AI but presented as facts – have been a known issue since the launch of ChatGPT. While the frequency of these hallucinations has decreased with improvements in model designs, there has been a noticeable uptick since these models gained internet access.

But what causes these errors? OpenAI’s research team explored this issue in a study titled Why language models hallucinate. Here’s a detailed discussion.

Hallucinations: Uncertainty is Not an Option for ChatGPT

The study reveals that hallucinations begin during the pre-training phase. At this stage, models are trained to predict the next word in a text, which helps them learn grammar, spelling, and common phrases based on statistical regularities. However, for rare or unique facts, such as information mentioned only once in a training corpus, there is no pattern to follow, forcing the model to make a guess.

These errors are further exacerbated by how models are assessed. During performance tests, AIs are evaluated based on their accuracy. In such scenarios, if a model does not know an answer, it statistically has a better chance of scoring well by making a guess rather than admitting its ignorance.

Consider a multiple-choice test. If you don’t know the answer but guess, you might be correct. Similarly, when models are scored solely on accuracy, they are encouraged to guess instead of saying “I don’t know,” according to OpenAI.

Rethinking Model Evaluation

To reduce hallucinations, OpenAI suggests rethinking the evaluation criteria for models. The goal is to implement a more nuanced scoring system that penalizes confident but incorrect answers more severely while rewarding abstentions or responses that express uncertainty.

Merely adding a few new tests that account for uncertainty isn’t enough. Widely used assessments based on accuracy need to be updated so that their scoring discourages guessing. If major dashboards continue to reward lucky guesses, models will keep learning to guess, OpenAI states.

In line with this approach, OpenAI now categorizes responses to unique-answer questions into three types: correct answers, errors, and abstentions. Abstention is seen as a form of humility, preferable to an incorrect response. This framework was used to compare two models, GPT-5-thinking-mini and o4-mini, with revealing results.

  • GPT-5-thinking-mini: this model abstains more frequently when unsure of the answer. This naturally reduces the number of correct responses, but significantly limits errors. It might seem less effective if only looking at the rate of correct answers, but it is more reliable because it hallucinates much less.
  • o4-mini: in contrast, this model responds almost always, even when in doubt. This allows it to achieve slightly more correct answers in absolute terms, but at the cost of a substantially higher error rate.

Similar Posts

Rate this post

Leave a Comment

Share to...