frontpage.

Show HN: 500-cycle runtime test for long-horizon LLM coherence

https://zenodo.org/records/18369990

1•teugent•1w ago

We ran a 500-cycle benchmark to test long-horizon reasoning stability in large language models — not just output quality, but whether a model can maintain coherent identity and logic across hundreds of recursive reasoning steps.

This is part of our SIGMA Runtime project — a cognitive control layer that runs on top of any LLM and tracks drift, coherence, and identity persistence in real time.

---

Why we did this

Most LLM evals measure short reasoning spans — 1-10 turns. But when a model is asked to sustain a line of reasoning over hundreds of steps, subtle feedback effects appear:

- Semantic drift: meaning slowly shifts as text compounds. - Crystallization: the model locks into repeating its own phrasing or style. - Identity loss: the “speaker” loses internal consistency.

We wanted to see whether it’s possible to prevent these effects at runtime, without retraining or prompt resets.

---

What’s new here

We replaced the older ACE anti-crystallization layer with a new system called AEP (Adaptive Entropy Protocol) — a real-time regulator that injects controlled entropy into model outputs.

AEP tracks three internal metrics: - TI — Terminological Isometry (consistency of key concepts) - SDC — Semantic Drift Coefficient (meaning variation rate) - L/N — Logic-to-Noise ratio (logical density vs surface variation)

When the model becomes too stable (repetition, rigid phrasing), AEP adds micro-perturbations to restore variation. When it drifts too far, it dampens entropy back into equilibrium.

---

How we tested it

- 500 reasoning cycles per model (OpenAI GPT-5.2 & Gemini-3-Flash Preview) - Every 50th cycle = a Rib Point that compresses and verifies the last 49 steps - Continuous telemetry from the runtime (coherence, drift, entropy) - Identity: same synthetic agent (“LEO”, AI architect/cognitive scientist)

---

What happened

Both models completed all 500 cycles without identity loss or semantic collapse. Entropy modulation increased lexical variety, while keeping reasoning trajectories coherent.

When truncations occurred (Gemini API), the runtime reconstructed missing context using prior compression checkpoints.

---

Visual results

Drift & coherence evolution (500 cycles) GPT-5.2: https://files.sigmastratum.net/Leo_OpenAI_D_summary_dashboar... Gemini-3-Flash: https://files.sigmastratum.net/Leo_Gogle_D_summary_dashboard...

AEP metric dynamics (TI, SDC, L/N) GPT-5.2: https://files.sigmastratum.net/Leo_OpenAI_E_metrics_timeline... Gemini-3-Flash: https://files.sigmastratum.net/Leo_Gogle_E_metrics_timeline....

---

Takeaway

- Entropy can be regulated, not just randomized. - LLMs can maintain self-consistent reasoning over hundreds of cycles when given runtime feedback. - Structural stability (coherence, terminology, logic) doesn’t require retraining — only a dynamic control layer.

---

Report (DOI): https://doi.org/10.5281/zenodo.18271591 Code & appendix: https://github.com/sigmastratum/documentation

---

We’d love technical feedback on: - Runtime-level coherence control - Measuring “identity persistence” - Long-horizon reasoning tests (100+ turns)

Apache Poison Fountain

Web.whatsapp.com appears to be having issues syncing and sending messages

Google in Your Terminal

Shannon: Claude Code for Pen Testing

Anthropic: Latest Claude model finds more than 500 vulnerabilities

Brooklyn cemetery plans human composting option, stirring interest and debate

Why the 'Strivers' Are Right

Brain Dumps as a Literary Form

Agentic Coding and the Problem of Oracles

Malicious packages for dYdX cryptocurrency exchange empties user wallets

Show HN: I built a <400ms latency voice agent that runs on a 4gb vram GTX 1650"

Penisgate erupts at Olympics; scandal exposes risks of bulking your bulge

Arcan Explained: A browser for different webs

What did we learn from the AI Village in 2025?

An open replacement for the IBM 3174 Establishment Controller

The P in PGP isn't for pain: encrypting emails in the browser

Show HN: Mirror Parliament where users vote on top of politicians and draft laws

Ask HN: Opus 4.6 ignoring instructions, how to use 4.5 in Claude Code instead?

We Mourn Our Craft

Jim Fan calls pixels the ultimate motor controller

Exploring a Modern SMTPE 2110 Broadcast Truck with My Dad

AI UX Playground: Real-world examples of AI interaction design

The Field Guide to Design Futures

The Other Leverage in Software and AI

AUR malware scanner written in Rust

Free FFmpeg API [video]

Are AI agents ready for the workplace? A new benchmark raises doubts

Show HN: AI Watermark and Stego Scanner

Clarity vs. complexity: the invisible work of subtraction

Solid-State Freezer Needs No Refrigerants