Most existing hallucination detectors rely on full LLM inference (expensive, slow), or struggle with long-context inputs.
I built LettuceDetect — an open-source, encoder-only framework that detects hallucinated spans in LLM-generated answers based on the retrieved context. No LLMs needed, and it much more efficiently.
Highlights:
- Token-level hallucination detection (unsupported spans flagged based on retrieved evidence)
- Built on ModernBERT — handles up to 4K token contexts
- 79.22% F1 on the RAGTruth benchmark (beats previous encoder models, competitive with LLMs)
- MIT licensed
— Includes Python packages, pretrained models, and Hugging Face demo
GitHub: https://github.com/KRLabsOrg/LettuceDetect
Blog: https://huggingface.co/blog/adaamko/lettucedetect
Preprint: https://arxiv.org/abs/2502.17125
Models/Demo: https://huggingface.co/KRLabsOrg
Would love feedback from anyone working on RAG, hallucination detection, or efficient LLM evaluation. Also exploring real-time hallucination detection (vs. just post-gen) — open to thoughts/collab there.