AI Models Are More Likely to Hallucinate (and It’s Unclear Why)

Hallucinations have always been a problem for generative AI models: the same structure that allows them to be creative and produce text and images also makes them prone to fiction. And the problem of hallucinations isn’t getting better as AI models evolve—in fact, it’s getting worse.
In a new technical report from OpenAI (via The New York Times ), the company details how its latest o3 and o4-mini models hallucinate at 51% and 79%, respectively, in an AI test known as SimpleQA. For the earlier o1 model, SimpleQA’s hallucination rate is 44 percent.
These are surprisingly high numbers, and they are moving in the wrong direction. These models are known as reasoning models because they think through answers and produce them more slowly. Obviously, based on OpenAI’s own testing, thinking about answers this way leaves more room for error and inaccuracy.
Falsehoods are by no means limited to OpenAI and ChatGPT. For example, it didn’t take me long when testing the Google Review AI search feature to get it to make an error , and the AI’s inability to correctly retrieve information from the Internet has been well documented . Recently, a support bot for coding app AI Cursor announced a policy change that didn’t actually happen .
But you won’t find many mentions of these hallucinations in artificial intelligence company announcements about their latest and greatest products. Along with energy use and copyright infringement , hallucinations are something the big names in artificial intelligence would rather not talk about.
It’s funny that I haven’t noticed too many inaccuracies when using AI search and bots – the error rate is certainly nowhere near 79 percent, although mistakes are made. However, it looks like this problem will never go away, especially because the teams working on these AI models don’t fully understand why hallucinations occur.
In tests conducted by artificial intelligence platform developer Vectera, the results are much better , although not perfect, with many models showing hallucination rates of one to three percent. OpenAI’s o3 model is 6.8 percent, and the newer (and smaller) o4-mini is 4.6 percent. This is more in line with my experience with these tools, but even a very small amount of hallucinations can mean a big problem, especially as we delegate more and more tasks and responsibilities to these AI systems.
Finding out the causes of hallucinations
No one really knows how to fix hallucinations or fully determine their causes: these models are not designed to follow the rules set by their programmers, but to choose their own way of working and reacting. Vectara chief executive Amr Awadalla told the New York Times that AI models will “always hallucinate” and that these problems will “never go away.”
University of Washington professor Hannane Hadjishirzi, who is working on ways to reverse engineer AI responses, told NYT that “we still don’t know exactly how these models work.” Just like troubleshooting your car or computer, you need to know what went wrong so you can do something about it.
According to researcher Neil Chowdhury of the Transluce Artificial Intelligence Analytics Lab, the way reasoning models are built may make the problem worse. “Our hypothesis is that the type of reinforcement learning used for O-series models may amplify problems that are typically mitigated (but not completely eliminated) by standard post-training pipelines,” he told TechCrunch .
Meanwhile, OpenAI’s own performance report mentions the problem of “less knowledge of the world” and also notes that the o3 model tends to make more claims than its predecessor, which then leads to more hallucinations. Ultimately, however, OpenAI said, “more research is needed to understand the reason for these results.”
And there are a lot of people doing this research. For example, scientists at the University of Oxford published a method for determining the likelihood of hallucinations by measuring the differences between multiple AI outputs. However, this requires more time and processing power and doesn’t actually solve the problem of hallucinations – it just tells you when they’re more likely to occur.
While letting AI models check their facts online can help in certain situations, they aren’t particularly good at it either. They don’t (and never will) have the basic human common sense that says you shouldn’t put glue on a pizza or that $410 for a coffee at Starbucks is clearly a mistake.
What’s certain is that AI bots can’t be trusted all the time, despite their confident tone—whether they’re providing you with news reports , legal advice , or interview transcripts . This is important to remember because these AI models are becoming more and more common in our personal and work lives, and it is a good idea to limit the use of AI to cases where hallucinations are not a big deal.
Disclosure: Lifehacker’s parent company, Ziff Davis, filed a lawsuit against OpenAI in April, alleging that it violated Ziff Davis’ copyrights in the training and operation of its artificial intelligence systems.