The latest programmes from OpenAI Google and DeepSeek – so-called ‘reasoning’ systems – are generating more hallucinations than previous systems and no one knows why.
Hallucination rates on a test devised by AI tool developer Vectara have risen with reasoning systems. DeepSeek’s reasoning system, R1, hallucinated 14.3% of the time. OpenAI’s o3 – its most powerful system – hallucinated 6.8% of the time.
o3 was found to hallucinate 33% of the time when running its PersonQA benchmark test, which involves answering questions about public figures – more than twice the hallucination rate of o1. The new o4-mini hallucinated at 48%.
The new programmes, which are trained on ever-increasing quantities of data, use mathematical probabilities to determine their responses and sometimes make up answers which aren’t true
In one case a company called Cursor found its AI bot was telling customers they couldn’t use the company’s product on more than one computer. When customers complained, Cursor’s CEO responded: “Unfortunately, this is an incorrect response from a front-line A.I. support bot.”
There have been several cases where lawyers have had their documentation written by AI programmes which have invented fictional legal cases to cite as precedents to support a contention.
Vectara, has tracked the veracity of chatbots by getting them to perform a straightforward task that is readily verified. It found that chatbots made up information at least 3% of the time rising to 27% of the time.
Vectara’s CEO, Amr Awadallah, says: “Despite our best efforts, they will always hallucinate. That will never go away.”