The core thesis: Models suffer from "Parametric Hubris". They rely on their training data (lazy) instead of using search tools, even when browsing is enabled.
Data: GPT-5 only triggers search in ~31% of prompts.
The Fix: A pipeline called "Veritas" that forces 100% retrieval (no parametric memory allowed for answers).
Results: Achieves 89.1% F-Score on SimpleQA Verified (vs 51.6% for GPT-5 and 72.1% for Gemini 3 Pro).
Cost/Model: Built on Gemini 2.5 Flash Lite (cheapest model) for ~$0.002 per query.
Trade-off: It’s slow (~115s per query), but accurate.
The paper argues that hallucination isn't a capability problem, but an architectural discipline problem. Code and data are open source.
Paper/Repo: https://github.com/lamLumae/Project-Lutum-Veritas