Show HN: Detect LLM hallucinations via geometric drift (0.9 AUC, 1% overhead)

https://github.com/yubainu/sibainu-engine

1•yubainu•2h ago

I built SIB-ENGINE, a real-time hallucination detection system that monitors LLM internal structure rather than output content.

KEY RESULTS (Gemma-2B, N=1000): • 54% hallucination detection with 7% false positive rate • <1% computational overhead (runs on RTX 3050 with 4GB VRAM) • ROC-AUC: 0.8995

WHY IT'S DIFFERENT: Traditional methods analyze the output text semantically. SIB-ENGINE monitors "geometric drift" in hidden states during generation - identifying the structural collapse of the latent space before the first incorrect token is sampled.

This approach offers unique advantages: • Real-time intervention: Stop generation mid-stream • Language-agnostic: No semantic analysis needed • Privacy-preserving: Never reads the actual content • Extremely lightweight: Works on consumer hardware

HOW IT WORKS: SIB-ENGINE monitors the internal stability of the model's computation. While the system utilizes multiple structural signals to detect instability, two primary indicators include:

Representation Stability: Tracking how the initial intent is preserved or distorted as it moves through the model's transformation space.

Cross-Layer Alignment: Monitoring the consensus of information processing across different neural depths to identify early-stage divergence.

When these (and other proprietary structural signals) deviate from the expected stable manifold, the system flags a potential hallucination before it manifests in the output.

DEMO & CODE: • Demo video: https://www.youtube.com/watch?v=H1_zDC0SXQ8 • GitHub: https://github.com/yubainu/sibainu-engine • Raw data: raw_logs.csv (full transparency)

LIMITATIONS: • Tested on Gemma-2B only (2.5B parameters) • Designed to scale, but needs validation on larger models • Catches "structurally unstable" hallucinations (about half) • Best used as first-line defense in ensemble systems

TECHNICAL NOTES: • No external models needed (unlike self-consistency methods) • No knowledge bases required (unlike RAG approaches) • Adds ~1% inference time vs. 300-500% for semantic methods • Works by monitoring the process not the product

I'd love feedback on: • Validation on larger models (Seeking strategic partnerships and compute resources for large-scale validation.) • Integration patterns for production systems • Comparison with other structural approaches • Edge cases where geometric signals fail

This represents a fundamentally different paradigm: instead of asking "is this text correct?", we ask "was the generation process unstable?" The answer is surprisingly informative.

Happy to discuss technical details in the comments!

Comments

yubainu•2h ago

I’ve been exploring why LLMs "break" during inference. Most current hallucination detection methods look at the final text (semantic analysis) or use another LLM to double-check (self-consistency). These are effective but extremely slow and expensive.

SIB-ENGINE is my attempt to solve this at the geometric layer. By monitoring the "Anchor Drift" (how hidden states deviate from the prompt’s latent trajectory), I found that hallucinations often manifest as a structural instability before the token is even sampled.

The Numbers:

Recall: 53.89% (It catches about half, but it's consistent)

Precision: 88.52% (Low false-alarm rate is my priority)

Overhead: <1% (Running on an RTX 3050 with 4GB VRAM)

AUC: 0.8995

I've released a Lite version (1-axis) on GitHub so you can see the fundamental logic and run it on your own machine. I’ve also included the raw_logs.csv from my N=1000 test run on Gemma-2B for full transparency.

I’m particularly curious if anyone here has experimented with similar geometric approaches or has thoughts on how this might scale to 70B+ models where the latent space is significantly denser.

Happy to dive into the technical details!

Apache NetBeans 29 Released

Boogiebench: Evaluating models' ability to write music

NPR Finds 53 Missing 'Trump' Pages – The DOJ Has No Explanation

The Rejection of Artificially Generated Slop (Rags)

Show HN: GhostVM – native macOS VMs for secure dev and isolated agent workflows

The paradox of Bangladesh's democratic rebirth

Show HN: Idea Reality MCP – Pre-build reality check for AI coding agents

Show HN: Tag Promptless on any GitHub PR/Issue to get updated user-facing docs

Show HN: Emdash – Open-source agentic development environment

Pentagon, Musk's xAI reach agreement to use Grok in classified systems

Former Norwegian premier hospitalized after suicide attempt amid Epstein charges

Jira Ticket Analysis Web App (Free)

Myelin repair promoted by clemastine fumarate in nonhuman primate model

One workspace for inspiration, intelligence, and creation

Show HN: Unthumb – Replace YT thumbnails with frames from the video

STARC framework for Bank-Fintech risk management

Emissaries – Constitutional principles for personal agents

Querying 3B Vectors

Finding Hidden Cloud Savings

Anthropic accuses China of 'industrial scale' attempt to steal Claude

Least Privilege Manifesto

Show HN: LoMux – Lightweight FFmpeg GUI in Rust (3MB Binary)

Sonic Attack on a Silent Vigil

Re-thinking candidate take-homes in the AI Era: transcripts over code

1Password Raising Prices ~33%

Workaholic open source developers need to take breaks

Tritone Substitution

Giant Steps

Formal determination of deidentification under California law

Takeaways of building an MCP Server for my app