How to fact-check AI answers (and catch hallucinations)

AI still invents facts with total confidence — error rates hit 40% on hard tasks. A repeatable routine for catching hallucinations before they cost you.

Marcie Ellis avatar
Marcie Ellis
Content Marketer
2 min read
a checklist beside a chat bubble with one claim circled in red for verification

AI still makes things up — and it does so with total fluency, which is what makes it dangerous. Independent testing in 2026 put error rates as high as 40% on hard, specific tasks, and a UK investment firm reportedly lost $1.2 million acting on a merger announcement its AI had hallucinated. The fix isn't a better model; it's a habit. Here's a repeatable routine for catching hallucinations before they reach a document, a decision, or a client.

Why models sound sure when they're wrong

A language model generates plausible text, not true text. When it doesn't know, it doesn't hesitate — it produces the most likely-sounding answer, complete with confident phrasing, fake citations, and invented specifics. Confidence and correctness are unrelated, which is the whole problem: you cannot read certainty off the tone. This is the practical core of the argument in confidence is not correctness.

The five-step fact-check routine

  1. Flag the specifics. Names, numbers, dates, quotes, citations, and legal or medical claims are where hallucinations hide. Vague prose is usually safe; specific facts are the risk surface.
  2. Demand sources, then open them. Ask "what's your source for that, with a link?" — and actually click it. A model will happily cite a URL that doesn't exist or doesn't say what it claims.
  3. Cross-check with a second model. Paste the same claim into a different model. Independent agreement is a strong signal; disagreement is a flashing light. (More on this below.)
  4. Verify at the primary source. For anything load-bearing, confirm the specific fact at its origin — the paper, the docs, the filing — not a summary of it.
  5. Mark the unverified. If you can't confirm a claim, label it unverified rather than letting it pass as fact. Most expensive mistakes are unverified claims that got promoted to "true" by silence.

The cross-model trick

This is also why multi-model access is a reliability feature, not a luxury. Running a claim past GPT, Claude, and Gemini is the cheapest fact-check you have — the same instinct behind right model vs best model and the verification discipline in AI research with citations.

When to bother

Calibrate to stakes. Brainstorming and first drafts? Skip it. Anything that ships, informs a decision, or carries your name? Run the full routine. The five steps take two minutes; un-doing a confident error takes a great deal longer.

Where this fits

A good system prompt can lower the hallucination rate — telling the model to say "I don't know" and to cite sources genuinely helps, which is part of why one prompt across GPT, Claude, and Gemini is worth keeping. But the durable fix is structural: keep more than one model within reach so cross-checking is one paste away. That's what oran.chat is built around — start free and make the second opinion a habit. For more on getting reliable work out of every chat, see Playbooks.