The research team at OpenAI has delved into the phenomenon of hallucinations in ChatGPT, specifically focusing on its inability to acknowledge when it does not know an answer.
Scientists confirm: This is the most effective way to get your cat’s attention, according to new research
Elderly Couple Refuses Reserved Seats—Viral Train Standoff Sparks Fiery Debate on Courtesy
Artificial intelligence hallucinations – responses fabricated by AI but presented as facts – have been a known issue since the launch of ChatGPT. While the frequency of these hallucinations has decreased with improvements in model designs, there has been a noticeable uptick since these models gained internet access.
But what causes these errors? OpenAI’s research team explored this issue in a study titled Why language models hallucinate. Here’s a detailed discussion.
Hallucinations: Uncertainty is Not an Option for ChatGPT
The study reveals that hallucinations begin during the pre-training phase. At this stage, models are trained to predict the next word in a text, which helps them learn grammar, spelling, and common phrases based on statistical regularities. However, for rare or unique facts, such as information mentioned only once in a training corpus, there is no pattern to follow, forcing the model to make a guess.
Why You Should Never Reheat These Foods in the Microwave – The Hidden Dangers Experts Warn About
I tried the top 5 guard dogs—here’s what makes these breeds the ultimate protectors
These errors are further exacerbated by how models are assessed. During performance tests, AIs are evaluated based on their accuracy. In such scenarios, if a model does not know an answer, it statistically has a better chance of scoring well by making a guess rather than admitting its ignorance.
Consider a multiple-choice test. If you don’t know the answer but guess, you might be correct. Similarly, when models are scored solely on accuracy, they are encouraged to guess instead of saying “I don’t know,” according to OpenAI.
Rethinking Model Evaluation
To reduce hallucinations, OpenAI suggests rethinking the evaluation criteria for models. The goal is to implement a more nuanced scoring system that penalizes confident but incorrect answers more severely while rewarding abstentions or responses that express uncertainty.
Merely adding a few new tests that account for uncertainty isn’t enough. Widely used assessments based on accuracy need to be updated so that their scoring discourages guessing. If major dashboards continue to reward lucky guesses, models will keep learning to guess, OpenAI states.
In line with this approach, OpenAI now categorizes responses to unique-answer questions into three types: correct answers, errors, and abstentions. Abstention is seen as a form of humility, preferable to an incorrect response. This framework was used to compare two models, GPT-5-thinking-mini and o4-mini, with revealing results.
- GPT-5-thinking-mini: this model abstains more frequently when unsure of the answer. This naturally reduces the number of correct responses, but significantly limits errors. It might seem less effective if only looking at the rate of correct answers, but it is more reliable because it hallucinates much less.
- o4-mini: in contrast, this model responds almost always, even when in doubt. This allows it to achieve slightly more correct answers in absolute terms, but at the cost of a substantially higher error rate.
Similar Posts
- OpenAI Launches GPT-5.2: Discover the Enhancements Over 5.1!
- ChatGPT Revolutionizes GitHub: Deep Research Analyzes Repository Content!
- ChatGPT Revolutionizes Meetings: Now Integrates with Google Drive, Outlook, GitHub!
- AI Text Detectors: Can We Really Trust Them?
- AI Hallucinations: Which Models Are Most Prone to Errors?

Jordan Park writes in-depth reviews and editorial opinion pieces for Touch Reviews. With a background in UI/UX design, Jordan offers a unique perspective on device usability and user experience across smartphones, tablets, and mobile software.