frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: 500-cycle runtime test for long-horizon LLM coherence

https://zenodo.org/records/18369990
1•teugent•1w ago
We ran a 500-cycle benchmark to test long-horizon reasoning stability in large language models — not just output quality, but whether a model can maintain coherent identity and logic across hundreds of recursive reasoning steps.

This is part of our SIGMA Runtime project — a cognitive control layer that runs on top of any LLM and tracks drift, coherence, and identity persistence in real time.

---

Why we did this

Most LLM evals measure short reasoning spans — 1-10 turns. But when a model is asked to sustain a line of reasoning over hundreds of steps, subtle feedback effects appear:

- Semantic drift: meaning slowly shifts as text compounds. - Crystallization: the model locks into repeating its own phrasing or style. - Identity loss: the “speaker” loses internal consistency.

We wanted to see whether it’s possible to prevent these effects at runtime, without retraining or prompt resets.

---

What’s new here

We replaced the older ACE anti-crystallization layer with a new system called AEP (Adaptive Entropy Protocol) — a real-time regulator that injects controlled entropy into model outputs.

AEP tracks three internal metrics: - TI — Terminological Isometry (consistency of key concepts) - SDC — Semantic Drift Coefficient (meaning variation rate) - L/N — Logic-to-Noise ratio (logical density vs surface variation)

When the model becomes too stable (repetition, rigid phrasing), AEP adds micro-perturbations to restore variation. When it drifts too far, it dampens entropy back into equilibrium.

---

How we tested it

- 500 reasoning cycles per model (OpenAI GPT-5.2 & Gemini-3-Flash Preview) - Every 50th cycle = a Rib Point that compresses and verifies the last 49 steps - Continuous telemetry from the runtime (coherence, drift, entropy) - Identity: same synthetic agent (“LEO”, AI architect/cognitive scientist)

---

What happened

Both models completed all 500 cycles without identity loss or semantic collapse. Entropy modulation increased lexical variety, while keeping reasoning trajectories coherent.

When truncations occurred (Gemini API), the runtime reconstructed missing context using prior compression checkpoints.

---

Visual results

Drift & coherence evolution (500 cycles) GPT-5.2: https://files.sigmastratum.net/Leo_OpenAI_D_summary_dashboar... Gemini-3-Flash: https://files.sigmastratum.net/Leo_Gogle_D_summary_dashboard...

AEP metric dynamics (TI, SDC, L/N) GPT-5.2: https://files.sigmastratum.net/Leo_OpenAI_E_metrics_timeline... Gemini-3-Flash: https://files.sigmastratum.net/Leo_Gogle_E_metrics_timeline....

---

Takeaway

- Entropy can be regulated, not just randomized. - LLMs can maintain self-consistent reasoning over hundreds of cycles when given runtime feedback. - Structural stability (coherence, terminology, logic) doesn’t require retraining — only a dynamic control layer.

---

Report (DOI): https://doi.org/10.5281/zenodo.18271591 Code & appendix: https://github.com/sigmastratum/documentation

---

We’d love technical feedback on: - Runtime-level coherence control - Measuring “identity persistence” - Long-horizon reasoning tests (100+ turns)

Apache Poison Fountain

https://gist.github.com/jwakely/a511a5cab5eb36d088ecd1659fcee1d5
1•atomic128•1m ago•0 comments

Web.whatsapp.com appears to be having issues syncing and sending messages

http://web.whatsapp.com
1•sabujp•1m ago•1 comments

Google in Your Terminal

https://gogcli.sh/
1•johlo•2m ago•0 comments

Shannon: Claude Code for Pen Testing

https://github.com/KeygraphHQ/shannon
1•hendler•3m ago•0 comments

Anthropic: Latest Claude model finds more than 500 vulnerabilities

https://www.scworld.com/news/anthropic-latest-claude-model-finds-more-than-500-vulnerabilities
1•Bender•7m ago•0 comments

Brooklyn cemetery plans human composting option, stirring interest and debate

https://www.cbsnews.com/newyork/news/brooklyn-green-wood-cemetery-human-composting/
1•geox•7m ago•0 comments

Why the 'Strivers' Are Right

https://greyenlightenment.com/2026/02/03/the-strivers-were-right-all-along/
1•paulpauper•9m ago•0 comments

Brain Dumps as a Literary Form

https://davegriffith.substack.com/p/brain-dumps-as-a-literary-form
1•gmays•9m ago•0 comments

Agentic Coding and the Problem of Oracles

https://epkconsulting.substack.com/p/agentic-coding-and-the-problem-of
1•qingsworkshop•10m ago•0 comments

Malicious packages for dYdX cryptocurrency exchange empties user wallets

https://arstechnica.com/security/2026/02/malicious-packages-for-dydx-cryptocurrency-exchange-empt...
1•Bender•10m ago•0 comments

Show HN: I built a <400ms latency voice agent that runs on a 4gb vram GTX 1650"

https://github.com/pheonix-delta/axiom-voice-agent
1•shubham-coder•10m ago•0 comments

Penisgate erupts at Olympics; scandal exposes risks of bulking your bulge

https://arstechnica.com/health/2026/02/penisgate-erupts-at-olympics-scandal-exposes-risks-of-bulk...
4•Bender•11m ago•0 comments

Arcan Explained: A browser for different webs

https://arcan-fe.com/2026/01/26/arcan-explained-a-browser-for-different-webs/
1•fanf2•12m ago•0 comments

What did we learn from the AI Village in 2025?

https://theaidigest.org/village/blog/what-we-learned-2025
1•mrkO99•13m ago•0 comments

An open replacement for the IBM 3174 Establishment Controller

https://github.com/lowobservable/oec
1•bri3d•15m ago•0 comments

The P in PGP isn't for pain: encrypting emails in the browser

https://ckardaris.github.io/blog/2026/02/07/encrypted-email.html
2•ckardaris•17m ago•0 comments

Show HN: Mirror Parliament where users vote on top of politicians and draft laws

https://github.com/fokdelafons/lustra
1•fokdelafons•18m ago•1 comments

Ask HN: Opus 4.6 ignoring instructions, how to use 4.5 in Claude Code instead?

1•Chance-Device•19m ago•0 comments

We Mourn Our Craft

https://nolanlawson.com/2026/02/07/we-mourn-our-craft/
1•ColinWright•22m ago•0 comments

Jim Fan calls pixels the ultimate motor controller

https://robotsandstartups.substack.com/p/humanoids-platform-urdf-kitchen-nvidias
1•robotlaunch•26m ago•0 comments

Exploring a Modern SMTPE 2110 Broadcast Truck with My Dad

https://www.jeffgeerling.com/blog/2026/exploring-a-modern-smpte-2110-broadcast-truck-with-my-dad/
1•HotGarbage•26m ago•0 comments

AI UX Playground: Real-world examples of AI interaction design

https://www.aiuxplayground.com/
1•javiercr•27m ago•0 comments

The Field Guide to Design Futures

https://designfutures.guide/
1•andyjohnson0•27m ago•0 comments

The Other Leverage in Software and AI

https://tomtunguz.com/the-other-leverage-in-software-and-ai/
1•gmays•29m ago•0 comments

AUR malware scanner written in Rust

https://github.com/Sohimaster/traur
3•sohimaster•31m ago•1 comments

Free FFmpeg API [video]

https://www.youtube.com/watch?v=6RAuSVa4MLI
3•harshalone•31m ago•1 comments

Are AI agents ready for the workplace? A new benchmark raises doubts

https://techcrunch.com/2026/01/22/are-ai-agents-ready-for-the-workplace-a-new-benchmark-raises-do...
2•PaulHoule•36m ago•0 comments

Show HN: AI Watermark and Stego Scanner

https://ulrischa.github.io/AIWatermarkDetector/
1•ulrischa•37m ago•0 comments

Clarity vs. complexity: the invisible work of subtraction

https://www.alexscamp.com/p/clarity-vs-complexity-the-invisible
1•dovhyi•38m ago•0 comments

Solid-State Freezer Needs No Refrigerants

https://spectrum.ieee.org/subzero-elastocaloric-cooling
2•Brajeshwar•38m ago•0 comments