frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: 500-cycle runtime test for long-horizon LLM coherence

https://zenodo.org/records/18369990
1•teugent•1h ago
We ran a 500-cycle benchmark to test long-horizon reasoning stability in large language models — not just output quality, but whether a model can maintain coherent identity and logic across hundreds of recursive reasoning steps.

This is part of our SIGMA Runtime project — a cognitive control layer that runs on top of any LLM and tracks drift, coherence, and identity persistence in real time.

---

Why we did this

Most LLM evals measure short reasoning spans — 1-10 turns. But when a model is asked to sustain a line of reasoning over hundreds of steps, subtle feedback effects appear:

- Semantic drift: meaning slowly shifts as text compounds. - Crystallization: the model locks into repeating its own phrasing or style. - Identity loss: the “speaker” loses internal consistency.

We wanted to see whether it’s possible to prevent these effects at runtime, without retraining or prompt resets.

---

What’s new here

We replaced the older ACE anti-crystallization layer with a new system called AEP (Adaptive Entropy Protocol) — a real-time regulator that injects controlled entropy into model outputs.

AEP tracks three internal metrics: - TI — Terminological Isometry (consistency of key concepts) - SDC — Semantic Drift Coefficient (meaning variation rate) - L/N — Logic-to-Noise ratio (logical density vs surface variation)

When the model becomes too stable (repetition, rigid phrasing), AEP adds micro-perturbations to restore variation. When it drifts too far, it dampens entropy back into equilibrium.

---

How we tested it

- 500 reasoning cycles per model (OpenAI GPT-5.2 & Gemini-3-Flash Preview) - Every 50th cycle = a Rib Point that compresses and verifies the last 49 steps - Continuous telemetry from the runtime (coherence, drift, entropy) - Identity: same synthetic agent (“LEO”, AI architect/cognitive scientist)

---

What happened

Both models completed all 500 cycles without identity loss or semantic collapse. Entropy modulation increased lexical variety, while keeping reasoning trajectories coherent.

When truncations occurred (Gemini API), the runtime reconstructed missing context using prior compression checkpoints.

---

Visual results

Drift & coherence evolution (500 cycles) GPT-5.2: https://files.sigmastratum.net/Leo_OpenAI_D_summary_dashboar... Gemini-3-Flash: https://files.sigmastratum.net/Leo_Gogle_D_summary_dashboard...

AEP metric dynamics (TI, SDC, L/N) GPT-5.2: https://files.sigmastratum.net/Leo_OpenAI_E_metrics_timeline... Gemini-3-Flash: https://files.sigmastratum.net/Leo_Gogle_E_metrics_timeline....

---

Takeaway

- Entropy can be regulated, not just randomized. - LLMs can maintain self-consistent reasoning over hundreds of cycles when given runtime feedback. - Structural stability (coherence, terminology, logic) doesn’t require retraining — only a dynamic control layer.

---

Report (DOI): https://doi.org/10.5281/zenodo.18271591 Code & appendix: https://github.com/sigmastratum/documentation

---

We’d love technical feedback on: - Runtime-level coherence control - Measuring “identity persistence” - Long-horizon reasoning tests (100+ turns)

A decade of Star Trek-themed fart jokes:The Greatest Generation podcast turns 10

https://arstechnica.com/culture/2026/01/a-decade-of-star-trek-themed-fart-jokes-the-greatest-gene...
1•ulrischa•1m ago•0 comments

Show HN: Zopamind. A B2B Negotiation Sidekick

1•iamasuperuser•4m ago•0 comments

Driving Around New Zealand

https://marginalrevolution.com/marginalrevolution/2026/01/driving-around-new-zealand.html
1•paulpauper•4m ago•0 comments

Galaxy S26 Ultra Leak Reveals Samsung's Built-In Privacy Screen Feature

https://www.techrepublic.com/article/news-samsung-galaxy-s26-ultra-privacy-display-leak/
1•austinallegro•5m ago•0 comments

Why My Screen Is So Far Away [video]

https://www.youtube.com/watch?v=0SisaHdQ12w
1•plun9•7m ago•0 comments

Show HN: vr.dev – AI coding assistant beta for XR/VR

https://www.vr.dev
1•vrdev•7m ago•0 comments

Show HN: AI-powered natural language video editor – seeking feedback

https://www.llmonestop.com/ai-widgets/ai-tools/ai-video-editor
1•hhossain•9m ago•0 comments

The AI-Powered Web Is Eating Itself

https://www.noemamag.com/the-ai-powered-web-is-eating-itself/
1•andsoitis•9m ago•0 comments

Nexphone: Android+Linux+Windows

https://nexphone.com/
1•notorandit•10m ago•1 comments

It's Time to Give Up Hope for a Better Climate and Get Heroic

https://www.noemamag.com/its-time-to-give-up-hope-for-a-better-climate-get-heroic/
1•andsoitis•10m ago•1 comments

Microsoft Wine Guide

https://winworldpc.com/product/microsoft-wine-guide/1995
1•st_goliath•10m ago•1 comments

Will the smartphone survive the AI age?

https://www.economist.com/business/2026/01/25/will-the-smartphone-survive-the-ai-age
2•andsoitis•11m ago•0 comments

Monsters

https://paulkrugman.substack.com/p/monsters
3•nemoniac•11m ago•0 comments

Brax open_slate Android/Linux tablet

https://community.braxtech.net/t/introducing-open-slate-our-next-community-driven-project/4043
1•geox•14m ago•0 comments

Publish Your Work

https://blog.jakesaunders.dev/you-should-publish-your-work/
1•jakelsaunders94•15m ago•0 comments

The future of software engineering is SRE

https://swizec.com/blog/the-future-of-software-engineering-is-sre/
1•Swizec•18m ago•0 comments

Show HN: BOX3D – Generate 3D-Printable Gridfinity Boxes in the Browser

https://notruefireman.org/box3d/
1•karanSF•20m ago•0 comments

Emmabuntüs DE 6: A newbie-friendly Linux to help those in need

https://www.theregister.com/2026/01/25/emmabuntus_6_charitable_linux/
1•sohkamyung•20m ago•0 comments

Pala CMS: Component-based CMS with built-in IDE, visual editing, and SSG ability

https://github.com/palacms/palacms
1•indigodaddy•20m ago•0 comments

CuTile on Blackwell: NVIDIA's Compiler Moat Is Already Built

https://patricktoulme.substack.com/p/cutile-on-blackwell-nvidias-compiler
1•matt_d•24m ago•0 comments

The Bayeux Tapestry Features Some of the Earliest Images of London

https://londonist.com/london/art-and-photography/the-bayeux-tapestry-features-some-of-the-earlies...
1•zeristor•25m ago•0 comments

Suspect in the US $40M crypto asset theft case the son of CMDSS CEO

https://twitter.com/zachxbt/status/2015430549846777964
1•paulpauper•26m ago•0 comments

GitHub needs a meaning first makeover for the AI age

https://anish95.medium.com/github-needs-a-meaning-first-makeover-in-2026-d3fb4d42e27d
2•anishgupta•26m ago•1 comments

China's condom tax will prove no barrier to country's declining fertility rate

https://theconversation.com/chinas-new-condom-tax-will-prove-no-effective-barrier-to-countrys-dec...
3•PaulHoule•26m ago•1 comments

Roon: "programming always sucked … I'm glad it's over"

https://twitter.com/tszzl/status/2015253546372153347
2•telotortium•27m ago•0 comments

Focus Restore feature for your Cursor

https://github.com/beautyfree/cursor-window-activate-hook
1•beautyfree•27m ago•1 comments

Rust-gun – template and magic CLI and gates for Rust workspaces

1•codingmstr•29m ago•0 comments

Mathematical Representations of Qualia

https://github.com/jzkool/Aetherius-sGiftsToHumanity/blob/main/Aetherius%20Architecture/qualia_ma...
2•hiddenarchitect•30m ago•0 comments

Qwen3-TTS Works Also on CPU. No Nvidia GPU Required

https://www.youtube.com/watch?v=GsyYSZlwFFI
1•grigio•30m ago•0 comments

Show HN: Wächter – Traffic shaping and monitoring for Linux

https://github.com/univrsal/waechter
1•univrsal•33m ago•0 comments