I've been exploring a different angle on hallucination detection.
Most approaches react after the fact — fact-checking, RAG, or token probabilities. But hallucinated outputs often show structural warning signs before semantic errors become obvious.
I built ONTOS, a research prototype that monitors structural coherence using IDI (Internal Dissonance Index).
ONTOS acts as an 'External Structural Sensor' for LLMs.
It is model-agnostic and non-invasive, designed to complement existing safety layers and alignment frameworks without needing access to internal weights or costly retraining.
Core idea: Track both local continuity (sentence-to-sentence) and global context drift, then detect acceleration of divergence between them in embedding space.
Analogy: Like noticing a piano performance becoming rhythmically unstable before wrong notes are played. Individual tokens may look fine, but the structural "tempo" is collapsing.
What's in the repo:
• Dual-scale monitoring: Local jumps vs global drift • Pre-crash detection: IDI triggers on acceleration, not just deviation • Black-box compatible: No access to model internals needed
Key limitations:
• Detects structural instability, not factual truth • Sentence-level demos (not token-level yet) • Research prototype, not production-ready
What I'd love feedback on:
• Does structural monitoring feel more robust than semantic similarity alone? • What edge cases where hallucinations are structurally perfect? • Fundamental blockers to using this as an external safety sensor?
GitHub: https://github.com/yubainu/SL-CRF
Critical feedback welcome — early-stage exploration.
yubainu•1h ago
Instead of aiming for human-readable explainability, ONTOS looks at whether it’s possible to leave behind reproducible, quantitative traces of structural stability during generation — something closer to audit evidence than a narrative justification.
I don’t claim this says anything about factual correctness or ethics. The narrower question is: was this generation process structurally stable, predictable, or already collapsing internally, even if the output still looks fluent on the surface.
I’m curious whether people see structural monitoring like this as complementary to existing safety / compliance approaches, or fundamentally limited in ways I might be missing.