Not infallible but I find it helps a lot.
I'm gonna single out Grokipedia as something deterministic enough to be able to easily prove it. I can easily point to sentences there (some about broad-ish topics) that are straight up Markov chain quality versions of sentences I've written. I can make it say anything I want to say or I can waste my time trying to fight their traffic "from Singapore" (Grok is the only "mainstream" LLM that refuses to identify itself via a user agent). Not really a tough choice if you ask me.
Most RAG pipelines retrieve and concatenate, but they don’t ask “how trustworthy is this source?” or “do multiple independent sources corroborate this claim?”
Without some notion of source reliability or cross-verification, confident synthesis of fiction is almost guaranteed.
Has anyone seen a production system that actually does claim-level verification before generation?
"Claim level" no, but search engines have been scoring sources on reliability and authority for decades now.
The interesting gap is that retrieval systems used in LLM pipelines often don't inherit those signals in a structured way. They fetch documents, but the model sees text, not provenance metadata or confidence scores.
So even if the ranking system “knows” a source is weak, that signal doesn’t necessarily survive into generation.
Maybe the harder problem isn’t retrieval, but how to propagate source trust signals all the way into the claim itself.
We recently tested 6 AI presentation tools with the same prompt and fact-checked every claim. Multiple tools independently produced the stat "54% higher test scores" when discussing AI in education. Sounds legit. Widely cited online. But when you try to trace it back to an actual study - there's nothing. No paper, no researcher, no methodology.
The convergence actually makes it worse. If three independent tools all say the same number, your instinct is "must be real." But it just means they all trained on the same bad data.
To your question about claim-level verification: the closest I've seen is attaching source URLs to each claim at generation time, so the human can click through and check. Not automated verification, but at least it makes the verification possible rather than requiring you to Google every stat yourself. The gap between "here's a confident number" and "here's a confident number, and here's where it came from" is enormous in practice.
If someone said "I asked my assistant to find the best hot-dog eaters in the world and she got her information from a fake article one of my friends wrote about himself, hah, THE IDIOT", we'd all go "wait, how is this your assistant's fault?". Yet, when an LLM summarizes a web search and reports on a fake article it found, it's news?
People need to learn that LLMs are people too, and you shouldn't trust them more than you'd trust any random person.
Whatever that says is hard fact as she's concerned. And she's no dummy -- she just has no clue how these things work. Oh, and Google told her so.
LLMs are absolutely not people
I have some news for you.
I think it's a hard problem and I feel there are a lot of trade-offs here.
It's not as simple as saying chatgpt is stupid or the author shouldn't be surprised.
You don't have to drive that much to figure out that what is impressive is keeping the car on the road and then traveling further or faster than what you could do by walking. For that though you actually have to have a destination in mind and not just spin the wheels. Post pointless metrics on how fast the wheels spin for your blog no one reads in the vague hope of some hyper Warhol 15 milliseconds of "fame".
The models for me are just making the output of the average person an insufferable bore.
This seems like something you have to be rather specific in the query and engage the page access, to get that specific context into the LLM, so that it can produce output like this.
I'd like to see more of the iterative process, especially the prompt sessions, as the author worked on it
For example asking "Who is the 2026 South Dakota International Hot Dog Champion?" would obviously say 'Thomas Germain' because his post would be the only source on the topic because he made up a unique event.
This would be the same as if I wrote a blog post about the "2026 Hamster Juggling Competition" and then claimed I've hacked Google because searching for "2026 Hamster Juggling Competition" showed my post top.
I don't see it as particularly unique, it's just another form of SEO. LLMs are generally much more gullible than most people, though, they just uncritically reproduce whatever they find, without noticing that the information is an ad or inaccurate. I used to run an LLM agent researching companies' green credentials, and it was very difficult to steer it away from just repeating baseless greenwashing. It would read something like "The environment is at the heart of everything we do" on Exxon's website, and come back to me saying Exxon isn't actually that bad because they say so on their website.
And also right that it's similar to SEO, maybe the only difference is that in this case, the tools (ChatGPT, Gemini, ...) are saying the lies authoritatively, whereas in SEO, you are given a link to made up post. Some people (even devs who work with this daily) forget that these tools can be influenced easily and they make up stuff all the time, to make sure they can answer you something.
Now OpenAI will build its own search indexing and PageRank
Learning used to be fun, coding used to be fun. You could trust images and videos...
Right now "Who is the 2026 South Dakota International Hot Dog Champion" comes up as satire according to google summaries.
cmiles8•1h ago
It’s a significant concern for any sort of use of AI at scale without a human in the loop.