Show HN: Running hallucination detection on a $200 GPU (RTX 3050, 4GB)

https://github.com/yubainu/sibainu-engine

2•yubainu•1h ago

I built SIB-ENGINE, a real-time hallucination detection system that monitors LLM internal structure rather than output content.

KEY RESULTS (Gemma-2B, N=1000):

• 54% hallucination detection with 7% false positive rate

• <1% computational overhead (runs on RTX 3050 with 4GB VRAM)

• ROC-AUC: 0.8995

WHY IT'S DIFFERENT:

Traditional methods analyze the output text semantically.

SIB-ENGINE monitors "geometric drift" in hidden states during generation - identifying the structural collapse of the latent space before the first incorrect token is sampled.

This approach offers unique advantages:

• Real-time intervention: Stop generation mid-stream

• Language-agnostic: No semantic analysis needed

• Privacy-preserving: Never reads the actual content

• Extremely lightweight: Works on consumer hardware

HOW IT WORKS: SIB-ENGINE monitors the internal stability of the model's computation. While the system utilizes multiple structural signals to detect instability, two primary indicators include:

Representation Stability: Tracking how the initial intent is preserved or distorted as it moves through the model's transformation space.

Cross-Layer Alignment: Monitoring the consensus of information processing across different neural depths to identify early-stage divergence.

When these (and other proprietary structural signals) deviate from the expected stable manifold, the system flags a potential hallucination before it manifests in the output.

DEMO & CODE:

• Demo video: https://www.youtube.com/watch?v=H1_zDC0SXQ8

• GitHub: https://github.com/yubainu/sibainu-engine

• Raw data: raw_logs.csv (full transparency)

LIMITATIONS:

• Tested on Gemma-2B only (2.5B parameters)

• Designed to scale, but needs validation on larger models

• Catches "structurally unstable" hallucinations (about half)

• Best used as first-line defense in ensemble systems

TECHNICAL NOTES:

• No external models needed (unlike self-consistency methods)

• No knowledge bases required (unlike RAG approaches)

• Adds ~1% inference time vs. 300-500% for semantic methods

• Works by monitoring the process not the product

I'd love feedback on:

• Validation on larger models (Seeking strategic partnerships and compute resources for large-scale validation.)

• Integration patterns for production systems

• Comparison with other structural approaches

• Edge cases where geometric signals fail

This represents a fundamentally different paradigm: instead of asking "is this text correct?", we ask "was the generation process unstable?" The answer is surprisingly informative.

Happy to discuss technical details in the comments!

Comments

yubainu•1h ago

I’ve been exploring why LLMs "break" during inference. Most current hallucination detection methods look at the final text (semantic analysis) or use another LLM to double-check (self-consistency). These are effective but extremely slow and expensive.

SIB-ENGINE is my attempt to solve this at the geometric layer. By monitoring the "Anchor Drift" (how hidden states deviate from the prompt’s latent trajectory), I found that hallucinations often manifest as a structural instability before the token is even sampled.

The Numbers:

Recall: 53.89% (It catches about half, but it's consistent)

Precision: 88.52% (Low false-alarm rate is my priority)

Overhead: <1% (Running on an RTX 3050 with 4GB VRAM)

AUC: 0.8995

I've released a Lite version (1-axis) on GitHub so you can see the fundamental logic and run it on your own machine. I’ve also included the raw_logs.csv from my N=1000 test run on Gemma-2B for full transparency.

I’m particularly curious if anyone here has experimented with similar geometric approaches or has thoughts on how this might scale to 70B+ models where the latent space is significantly denser.

Happy to dive into the technical details!

entrustai•46m ago

The geometric approach is interesting precisely because it's model-agnostic at the content level — you're detecting structural collapse in latent space before it surfaces as text, which means you don't need to know what a hallucination looks like semantically.

The 54% recall is the honest number to focus on. At 88% precision you're catching real problems when you flag them, but you're missing roughly half of all hallucinations entirely. For a suppression layer in a regulated context that's a meaningful gap — a compliance team can't tell a regulator "we caught most of them."

The complementary approach worth considering: deterministic post-generation checks on the output layer. Geometric drift catches structural collapse during generation. Rule-based output validation catches semantic violations after generation — banned claims, unattributed statistics, absolute guarantees. Neither approach alone is sufficient. Together they cover different failure modes.

Good work publishing the raw_logs.csv. Reproducibility at this layer is rare and matters.

yubainu•29m ago

Thanks for the precise critique. You are right: Recall 54% is the "danger zone." In a regulated or production environment, missing half of the structural collapses is functionally equivalent to zero protection. The 88% precision proves the signal exists, but the threshold for "collapse" in latent space is currently too rigid. The "Geometric approach (SIB) + Rule-based output validation" hybrid you suggested is the most logical path forward. • Geometric Drift (Layer-Internal): Catches the "process" of losing logical coherence (structural entropy). • Rule-based (Output-Layer): Catches the "result" of semantic violations (pre-defined constraints). My next focus is analyzing the "Silent Failures" — the 46% we missed. If the latent space doesn't show geometric collapse but the output is still a hallucination, it suggests the model is confidently drifting into a "parallel" but structurally stable manifold. That's a different failure mode that geometry alone can't catch. Reproducibility is the only way to move this out of "voodoo AI" territory. Glad the raw_logs.csv helped.

Show HN: OpenTrace – Self-hosted observability server with 75 MCP tools

AT&T Acquires CenturyLink

Automatic Discharges of Student Loans to Proceed After Dual Court Wins

Multi-agent workflows often fail

Show HN: Open-source MCP servers for self-hosted homelab AI

Show HN: PixShot – Screenshot and OG Image API

Lawsuit could slow Micron DRAM chipmaking project in New York

Nkmc – a virtual filesystem that lets AI agents call any API with ls, cat, grep

Random Ghostty theme on each launch

The Factory Model: How Coding Agents Changed Software Engineering

The Debian PHP team includes hard coded telemetry

Go-Native Durable Execution

Ask HN: Could you create a competitor to your company at 10% of the cost?

Five years after pay transparency law, many postings don't comply

Tool can summarize a YouTube video for you

Show HN: BrainDump – A daily writing prompt site

Feedback Engagement (2019)

Tool use and notation as shaping LLM generalization

Mummy Brown

Show HN: I built an LLM comment detector for HN (I got banned)

Blood Feud: Oura's Health Panels versus Whoop's Advanced Labs

How Long Will 50ml of Ink Last? (3 Different Nibs)

The Impossible Landing [video]

Show HN: Verity – I got tired of debugging duplicate emails after job restarts

Pulsar timing hints at a nearby dark matter 'sub-halo'

Solution to the Complaints about Anthropic

Shutdown at DHS Extends to Cyber Agency

Show HN: Tunejourney.com – A 3D radio globe with in-browser ML to auto-skip talk

There's no point in NOT building your own agents' orchestrator

Managing Complexity with Mycelium