frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Aura-State: Formally Verified LLM State Machine Compiler

16•rohanmunshi08•3d ago
I noticed a pattern: every LLM framework today lets the AI manage state and do math. Then we wonder why pipelines hallucinate numbers and break at 3 AM.

I took a different approach and built Aura-State, an open-source Python framework that compiles LLM workflows into formally verified state machines.

Instead of hoping the AI figures it out, I brought in real algorithms from hardware verification and statistical learning:

CTL Model Checking: the same technique used to verify flight control systems, now applied to LLM workflow graphs. Proves safety properties before execution.

Z3 Theorem Prover: every LLM extraction gets formally proven against business constraints. If the total ≠ price × quantity, Z3 catches it with a counterexample.

Conformal Prediction: distribution-free 95% confidence intervals on every extracted field. Not just "the LLM said $450k" but "95% CI: [$448k, $452k]."

MCTS Routing: Monte Carlo Tree Search (the algorithm behind AlphaGo) scores ambiguous state transitions mathematically.

Sandboxed Math: English math rules compile to Python AST. Zero hallucination calculations.

I ran a live benchmark against 10 real-estate sales transcripts using GPT-4o-mini: → 100% budget extraction accuracy ($0 mean error) → 20/20 Z3 proof obligations passed → 3/3 temporal safety properties proven → 65 automated tests passing

The gap between "it usually works" and "it provably works" is smaller than people think.

Would love feedback from anyone building production LLM systems; what would you want formally verified?

https://github.com/munshi007/Aura-State

Comments

aristofun•3d ago
Couple of simplified examples for us, mere mortals, not mathematicians would be nice.
ozozozd•3d ago
This is interesting and I would appreciate if you could elaborate with a few examples, and perhaps mention what inspired this, and/or what’s similar and/or analogous to your method.
Perenti•36m ago
Interesting. Seems you are automating my qwen workflow. Every output stage is verified through mathematical proof whenever possible, before being fed to the next step in transforming ideas into code. Except for when qwen decides to go in a very unusual direction, its working reasonably well at producing provably correct code. It's slowish though, with lots of nested iterations, and when qwen goes strange it takes a lot of effort to get it back on task.
mentalgear•26m ago
Despite the sophisticated theoretical frameworks and algorithmic safeguards, the core vulnerability remains: in an autonomous workflow, the LLM can hallucinate the input or fabricate the 'proofs' for the verification sandbox. This is essentially building elaborate scaffolding ontop a fundamental flaw.

But I reckon while this doesn't eliminate the need for human oversight, it might help supervisors sitting at human-in-the-loop checkpoints.

Garlef•6m ago
> hallucinate the input

True

> fabricate the 'proofs' for the verification sandbox

Not sure: It seems the tool here allows for non-llm code to perform validation inbetween the steps

So in this sense, it allows you to manage the LLM. Of course if you run the workflow through an outer agentic loop this would be different.

In this case, I think the reasonable approach would be to let the orchestrating tool write its results to a space that the agents themselves have no access to.

Nvidia PersonaPlex 7B on Apple Silicon: Full-Duplex Speech-to-Speech in Swift

https://blog.ivan.digital/nvidia-personaplex-7b-on-apple-silicon-full-duplex-speech-to-speech-in-...
127•ipotapov•4h ago•45 comments

Google Workspace CLI

https://github.com/googleworkspace/cli
629•gonzalovargas•11h ago•210 comments

Relicensing with AI-Assisted Rewrite

https://tuananh.net/2026/03/05/relicensing-with-ai-assisted-rewrite/
157•tuananh•6h ago•141 comments

The L in "LLM" Stands for Lying

https://acko.net/blog/the-l-in-llm-stands-for-lying/
245•LorenDB•7h ago•126 comments

Smalltalk's Browser: Unbeatable, yet Not Enough

https://blog.lorenzano.eu/smalltalks-browser-unbeatable-yet-not-enough/
52•mpweiher•3h ago•17 comments

Arabic document from 17th-cent. rubbish heap confirms semi-legendary Nubian king

https://phys.org/news/2026-02-arabic-document-17th-century-rubbish.html
51•wglb•2d ago•6 comments

Poor Man's Polaroid

https://boxart.lt/blog/poor_mans_polaroid
39•ZacnyLos•4h ago•8 comments

Building a new Flash

https://bill.newgrounds.com/news/post/1607118
559•TechPlasma•15h ago•171 comments

AMD will bring its “Ryzen AI” processors to standard desktop PCs for first time

https://arstechnica.com/gadgets/2026/03/amd-ryzen-ai-400-cpus-will-bring-upgraded-graphics-to-soc...
95•Bender•2d ago•73 comments

World-first gigabit laser link between aircraft and geostationary satellite

https://www.esa.int/Applications/Connectivity_and_Secure_Communications/World-first_gigabit-per-s...
17•giuliomagnifico•3d ago•3 comments

No right to relicense this project

https://github.com/chardet/chardet/issues/327
201•robin_reala•3h ago•125 comments

Jails for NetBSD – Kernel Enforced Isolation and Native Resource Control

https://netbsd-jails.petermann-digital.de/
29•vermaden•4h ago•11 comments

Something is afoot in the land of Qwen

https://simonwillison.net/2026/Mar/4/qwen/
682•simonw•19h ago•302 comments

MacBook Neo

https://www.apple.com/newsroom/2026/03/say-hello-to-macbook-neo/
1800•dm•21h ago•2110 comments

Show HN: Poppy – A simple app to stay intentional with relationships

https://poppy-connection-keeper.netlify.app/
111•mahirhiro•7h ago•44 comments

Aura-State: Formally Verified LLM State Machine Compiler

16•rohanmunshi08•3d ago•5 comments

BMW Group to deploy humanoid robots in production in Germany for the first time

https://www.press.bmwgroup.com/global/article/detail/T0455864EN/bmw-group-to-deploy-humanoid-robo...
149•JeanKage•14h ago•141 comments

Relax NG is a schema language for XML (2014)

https://relaxng.org/
30•Frotag•5h ago•16 comments

You Just Reveived

https://dylan.gr/1772520728
181•djnaraps•7h ago•54 comments

OpenBSD on SGI: A Rollercoaster Story

http://miod.online.fr/software/openbsd/stories/sgiall.html
17•brynet•5h ago•0 comments

Dario Amodei calls OpenAI’s messaging around military deal ‘straight up lies’

https://techcrunch.com/2026/03/04/anthropic-ceo-dario-amodei-calls-openais-messaging-around-milit...
616•SilverElfin•11h ago•327 comments

Dulce et Decorum Est (1921)

https://www.poetryfoundation.org/poems/46560/dulce-et-decorum-est
140•bikeshaving•14h ago•77 comments

Picking Up a Zillion Pieces of Litter

https://www.sixstepstobetterhealth.com/litter.html
135•colinbartlett•3d ago•45 comments

US tech firms pledge at White House to bear costs of energy for datacenters

https://www.theguardian.com/us-news/2026/mar/04/us-tech-companies-energy-cost-pledge-white-house
85•geox•9h ago•84 comments

Humans 40k yrs ago developed a system of conventional signs

https://www.pnas.org/doi/10.1073/pnas.2520385123
130•bikenaga•19h ago•54 comments

NRC issues first commercial reactor construction approval in 10 years [pdf]

https://www.nrc.gov/sites/default/files/cdn/doc-collection-news/2026/26-028.pdf
115•Anon84•13h ago•83 comments

What Python’s asyncio primitives get wrong about shared state

https://www.inngest.com/blog/no-lost-updates-python-asyncio
57•goodoldneon•8h ago•36 comments

Moss is a pixel canvas where every brush is a tiny program

https://www.moss.town/
268•smusamashah•1d ago•28 comments

“It turns out” (2010)

https://jsomers.net/blog/it-turns-out
289•Munksgaard•20h ago•89 comments

Qwen3.5 Fine-Tuning Guide

https://unsloth.ai/docs/models/qwen3.5/fine-tune
348•bilsbie•23h ago•85 comments