Show HN: Early detection of LLM hallucinations via structural dissonance

https://github.com/yubainu/SL-CRF

3•yubainu•2h ago

Hi HN,

I've been exploring a different angle on hallucination detection.

Most approaches react after the fact — fact-checking, RAG, or token probabilities. But hallucinated outputs often show structural warning signs before semantic errors become obvious.

I built ONTOS, a research prototype that monitors structural coherence using IDI (Internal Dissonance Index).

ONTOS acts as an 'External Structural Sensor' for LLMs.

It is model-agnostic and non-invasive, designed to complement existing safety layers and alignment frameworks without needing access to internal weights or costly retraining.

Core idea: Track both local continuity (sentence-to-sentence) and global context drift, then detect acceleration of divergence between them in embedding space.

Analogy: Like noticing a piano performance becoming rhythmically unstable before wrong notes are played. Individual tokens may look fine, but the structural "tempo" is collapsing.

What's in the repo:

• Dual-scale monitoring: Local jumps vs global drift • Pre-crash detection: IDI triggers on acceleration, not just deviation • Black-box compatible: No access to model internals needed

Key limitations:

• Detects structural instability, not factual truth • Sentence-level demos (not token-level yet) • Research prototype, not production-ready

What I'd love feedback on:

• Does structural monitoring feel more robust than semantic similarity alone? • What edge cases where hallucinations are structurally perfect? • Fundamental blockers to using this as an external safety sensor?

GitHub: https://github.com/yubainu/SL-CRF

Critical feedback welcome — early-stage exploration.

Comments

yubainu•1h ago

One thing I didn’t emphasize in the post: this work started partly from thinking about how black-box generative models might be audited under emerging regulations like the EU AI Act, where access to model internals or weights can’t be assumed.

Instead of aiming for human-readable explainability, ONTOS looks at whether it’s possible to leave behind reproducible, quantitative traces of structural stability during generation — something closer to audit evidence than a narrative justification.

I don’t claim this says anything about factual correctness or ethics. The narrower question is: was this generation process structurally stable, predictable, or already collapsing internally, even if the output still looks fluent on the surface.

I’m curious whether people see structural monitoring like this as complementary to existing safety / compliance approaches, or fundamentally limited in ways I might be missing.

Stop Generating, Start Thinking

Pain. Or, Why Learning to Code Is Like Learning Chinese. (2010)

Obsidian Introduces Obsidian CLI

Show HN: Bgpipe – pipe live BGP sessions through Python, add RPKI, etc.

Zillow wins court fight over private listings, enforcing ban on private listings

Open-source network simulators and emulators in 2026

Ex-GitHub CEO Launches a New Developer Platform for AI Agents

Pxlpal on CrowdSupply

"Just one more feature" is my new "just one more turn"

Geometric algebra: what is the inverse of a vector?

The Internet Still Works: Yelp Protects Consumer Reviews

MB Is a Lot of HTML

Show HN: Vibe – AI tool to automate social media content, posting, and reporting

Lissn.to

Bazzite Post-Mortem

Show HN: SyncKit – Open two browser tabs and watch CRDTs sync in real-time

Pgconsole

The Internet Still Works: Wikipedia Defends Its Editors

Texas Instruments to Acquire Silicon Labs

Thaw.zip: Private Subreddit Used by ICE

Why "Just Fine-Tune YOLO" Often Fails

Show HN: Shaders Public Beta – Shader Magic for Modern Frontends

Show HN: OpenClaw Guide – multilingual docs and skills leaderboard

Show HN: Self-improvement platform

OpenClaw – Hosting

Former GitHub CEO raises record $60M dev tool seed round at $300M valuation

Show HN: GrillMyPitch – An AI investor-readiness simulator for founders

I ditched Gmail for Thunderbird on my Android

How old were you when you decided to start giving up? (2010)

An Asteroid Might Slam into the Moon in 2032–and Create a Fiery Flash