TL;DR: I built a lightweight auditor that detects hallucinations by monitoring Transformer Hidden State Dynamics in real-time. It achieves 0.90+ ROC-AUC on Gemma/Llama-3.2/Mistral using a single RTX 3050 (4GB), with a core computation time of <1ms.
What it is
The Sibainu Engine is a pre-emptive auditing layer that identifies "latent trajectory collapse"—geometric turbulence in the vector transformations between transformer layers—before the token is even sampled. It requires no training and works with frozen weights.
The "15ms vs 1ms" Latency Reality
I prioritized "no-nonsense" performance reporting. In a local Python/FastAPI environment, the total response time is 15-25ms, but it's important to distinguish the components: Auditing Core (NumPy): < 1.0 ms. The actual vectorized math is near-instant. System Overhead: ~12.0 ms is spent on Pydantic validation and JSON-to-Array conversion.
The Bottom Line: The core logic is significantly faster than the LLM's token generation speed (typically 30-70ms), meaning the audit is theoretically "zero-overhead" if integrated directly into the C++/CUDA inference pipeline.
Key Metrics (Gemma-2B / HaluEval-QA)
ROC-AUC: 0.9176 Recall @ 5% False Signal Rate (FSR): 59.7% (It captures ~60% of hallucinations while only flagging 5% of factual truths). Hardware: Validated on consumer-grade RTX 3050 (4GB) using 4-bit (NF4) quantization.
How it works: Layer Dissonance
Instead of just looking at logit entropy, v6.4 monitors Layer Dissonance—the structural inconsistency between the middle and final layers. When a model hallucinations, the geometric stability between these layers exhibits a specific turbulence that is absent during factual recall.
Closed-Loop Recovery
I’ve included a recovery_agent_gemma.py that demonstrates Autonomous Safety Control. If the engine detects a physical neural anomaly (Score > 3.6510), it immediately aborts the session and triggers a re-generation using deterministic greedy search to stabilize the output.
yubainu•1h ago
When a human lies, the truth often reveals itself not in their words, but in their "tells"—a subtle change in facial expression or a shift in tone. I theorized that LLMs might exhibit similar "neural tells." When a model starts to hallucinate, there should be a detectable anomaly in the Hidden State Dynamics before the token is even sampled.
This led me to develop the Sibainu Engine.
My goal was to build a pre-emptive auditing layer that runs on consumer-grade hardware (RTX 3050 4GB). By monitoring the geometric stability (which I call "Layer Dissonance") between transformer layers in real-time, the engine identifies the "collapse of latent trajectory" with a core latency of less than 1ms.
Key Technical Highlights:
Efficiency: ROC-AUC > 0.90 across Gemma, Llama-3.2, and Mistral, without any additional training or fine-tuning.
Low Overhead: While the Python API adds some serialization delay, the vectorized NumPy core is fast enough to be integrated directly into any inference pipeline without bottlenecking generation.
Autonomous Recovery: I've included a demo where the engine aborts a "corrupted" session and triggers a deterministic re-generation the moment a physical neural anomaly is detected.
I believe that for LLM safety to be truly scalable, it needs to be lightweight and deterministic. I’m curious to hear your thoughts on this geometric approach and its potential generalizability to larger architectures.