frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: SIB-ENGINE Pre-emptive hallucination detection via geometric structure

https://github.com/yubainu/sibainu-engine
1•yubainu•1h ago
I built SIB-ENGINE, a real-time hallucination detection system that monitors LLM internal structure rather than output content.

KEY RESULTS (Gemma-2B, N=1000):

• 54% hallucination detection with 7% false positive rate

• <1% computational overhead (runs on RTX 3050 with 4GB VRAM)

• ROC-AUC: 0.8995

WHY IT'S DIFFERENT:

Traditional methods analyze the output text semantically.

SIB-ENGINE monitors "geometric drift" in hidden states during generation - identifying the structural collapse of the latent space before the first incorrect token is sampled.

This approach offers unique advantages:

• Real-time intervention: Stop generation mid-stream

• Language-agnostic: No semantic analysis needed

• Privacy-preserving: Never reads the actual content

• Extremely lightweight: Works on consumer hardware

HOW IT WORKS: SIB-ENGINE monitors the internal stability of the model's computation. While the system utilizes multiple structural signals to detect instability, two primary indicators include:

Representation Stability: Tracking how the initial intent is preserved or distorted as it moves through the model's transformation space.

Cross-Layer Alignment: Monitoring the consensus of information processing across different neural depths to identify early-stage divergence.

When these (and other proprietary structural signals) deviate from the expected stable manifold, the system flags a potential hallucination before it manifests in the output.

DEMO & CODE:

• Demo video: https://www.youtube.com/watch?v=H1_zDC0SXQ8

• GitHub: https://github.com/yubainu/sibainu-engine

• Raw data: raw_logs.csv (full transparency)

LIMITATIONS:

• Tested on Gemma-2B only (2.5B parameters)

• Designed to scale, but needs validation on larger models

• Catches "structurally unstable" hallucinations (about half)

• Best used as first-line defense in ensemble systems

TECHNICAL NOTES:

• No external models needed (unlike self-consistency methods)

• No knowledge bases required (unlike RAG approaches)

• Adds ~1% inference time vs. 300-500% for semantic methods

• Works by monitoring the process not the product

I'd love feedback on:

• Validation on larger models (Seeking strategic partnerships and compute resources for large-scale validation.)

• Integration patterns for production systems

• Comparison with other structural approaches

• Edge cases where geometric signals fail

This represents a fundamentally different paradigm: instead of asking "is this text correct?", we ask "was the generation process unstable?" The answer is surprisingly informative.

Happy to discuss technical details in the comments!

Comments

yubainu•1h ago
I’ve been exploring why LLMs "break" during inference. Most current hallucination detection methods look at the final text (semantic analysis) or use another LLM to double-check (self-consistency). These are effective but extremely slow and expensive.

SIB-ENGINE is my attempt to solve this at the geometric layer. By monitoring the "Anchor Drift" (how hidden states deviate from the prompt’s latent trajectory), I found that hallucinations often manifest as a structural instability before the token is even sampled.

The Numbers:

Recall: 53.89% (It catches about half, but it's consistent)

Precision: 88.52% (Low false-alarm rate is my priority)

Overhead: <1% (Running on an RTX 3050 with 4GB VRAM)

AUC: 0.8995

I've released a Lite version (1-axis) on GitHub so you can see the fundamental logic and run it on your own machine. I’ve also included the raw_logs.csv from my N=1000 test run on Gemma-2B for full transparency.

I’m particularly curious if anyone here has experimented with similar geometric approaches or has thoughts on how this might scale to 70B+ models where the latent space is significantly denser.

Happy to dive into the technical details!

Show HN: KeychainPGP – Copy, Encrypt, Paste. Simple PGP for the Rest of Us

https://github.com/KeychainPGP/keychainpgp
1•Sorr0w•24s ago•0 comments

Nation chip pact takes shape: Japan's capital, Taiwan's IP, India's talent

https://www.digitimes.com/news/a20260224VL219/taiwan-talent-semiconductor-industry-policy-labor.html
1•alephnerd•25s ago•0 comments

AI Is a Lethal Threat This Year

1•silexia•38s ago•0 comments

Show HN: I built an ML stock picker that runs daily on a single server

https://acis-trading.com/
1•fkratzer•1m ago•0 comments

Show HN: Open-Source EU AI Act Scanner for Python AI Projects

https://airblackbox.ai/demo
1•shotwellj•1m ago•0 comments

Caught in the Hook: RCE and API Token Exfiltration Through Claude Code

https://research.checkpoint.com/2026/rce-and-api-token-exfiltration-through-claude-code-project-f...
1•mvelbaum•2m ago•0 comments

Show HN: Solving "unknown unknowns" while studying with Claude Code

https://github.com/RoundTable02/tutor-skills
1•remocode•2m ago•0 comments

I think AI is swapping code debt for tooling debt

https://pckt.blog/b/dev-stream/i-think-ai-is-swapping-code-debt-for-tooling-debt-em3b48c
2•cprecioso•4m ago•0 comments

BrAIn: Persistent, Human-Inspired Memory for LLM Agents

https://github.com/glthr/brAIn
1•glth•4m ago•0 comments

Microsoft breaks email delivery to hotmail/outlook/live.com

https://learn.microsoft.com/en-us/answers/questions/5786144/all-sending-ips-temporarily-rate-limi...
1•junaru•4m ago•0 comments

Show HN: CivBench a long-horizon AI benchmark for multi-agent games

https://clashai.live
1•mbh159•4m ago•1 comments

Quartz: A signet-based proof-of-personhood system

https://modemworks.com/projects/quartz/
1•artur_makly•4m ago•0 comments

Zelda's Z-Targeting

https://parryeverything.com/2021/12/10/how-to-fight-things-in-three-dimensions-zeldas-z-targeting/
1•avaer•5m ago•0 comments

Data center developers asked Trump for an exemption from pollution rules

https://grist.org/regulation/these-data-center-developers-asked-trump-for-an-exemption-from-pollu...
2•Brajeshwar•6m ago•0 comments

The quixotic team trying to build a world in a 20-year-old game

https://arstechnica.com/gaming/2026/02/inside-the-quixotic-team-trying-to-build-an-entire-world-i...
1•Brajeshwar•6m ago•0 comments

How to Boost Your OpenClaw Bot 10x

https://www.getmaxim.ai/bifrost
1•aanthonymax•6m ago•1 comments

Largest coral colony discovered off Australian by mother-daughter team

https://www.cnn.com/2026/02/24/science/largest-coral-reef-australia-scli-intl
1•Brajeshwar•6m ago•0 comments

If AI is mission-critical for your business, build your own feedback loop

https://inato.substack.com/p/why-your-ai-product-needs-bespoke
1•anatolecallies•7m ago•1 comments

Net widening of Southern California beaches

https://www.nature.com/articles/s41467-026-68880-9
1•PaulHoule•8m ago•0 comments

Cloudflare: Run 15x more Containers with higher resource limits

https://developers.cloudflare.com/changelog/post/2026-02-25-higher-container-resource-limits/
1•tosh•8m ago•0 comments

CA Taxes and Cost of Living

https://www.sfgate.com/california/article/six-figure-salary-california-21938779.php
1•mistrial9•9m ago•0 comments

Show HN: MCPSpec – Ship reliable MCP servers without writing test code

https://light-handle.github.io/mcpspec/
3•warmcat•10m ago•1 comments

BuildKit: Docker's hidden gem that can build almost anything

https://tuananh.net/2026/02/25/buildkit-docker-hidden-gem/
1•tuananh•10m ago•0 comments

What CI Looks Like at a 100-Person Team

https://www.mendral.com/blog/ci-at-scale
1•shad42•10m ago•0 comments

Ask HN: Is RAG an antipattern for AI agents?

1•rklosowski•11m ago•0 comments

The Inversion: What Remains of Research When the Machine Can Ideate?

https://cacm.acm.org/blogcacm/the-inversion-what-remains-of-research-when-the-machine-can-ideate/
1•igloopan•11m ago•0 comments

Covid's origins: what we do and don't know

https://www.nature.com/articles/d41586-026-00530-y
1•bookofjoe•12m ago•0 comments

Sub-second volumetric 3D printing by synthesis of holographic light fields

https://www.nature.com/articles/s41586-026-10114-5
2•zdw•13m ago•0 comments

The AI Bubble Is Bursting

https://hughhowey.com/the-ai-bubble-is-bursting/
1•hentrep•14m ago•0 comments

Show HN: I built an engine to reverse-engineer car dealership lease math

https://quotedefender.com/blog/verified-lease-math-three-deals
1•amirjavid•14m ago•1 comments