frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Ask HN: When do you expect ChatGPT moment in robotics?

4•p1esk•2h ago•3 comments

Tell HN: MitID, Denmark's digital ID, was down

140•mousepad12•2d ago•180 comments

Ask HN: How to approach new people in 2026?

7•tavro•8h ago•8 comments

Ask HN: How will most Anthropic customers respond to the threats by the govt?

6•Poomba•5h ago•3 comments

Tell HN: YC companies scrape GitHub activity, send spam emails to users

677•miki123211•3d ago•257 comments

Aura-State: Formally Verified LLM State Machine Compiler

3•rohanmunshi08•10h ago•2 comments

I used 2D Base64 to bypass Gemini and expose Google's moderation flaws

6•MissMajordazure•17h ago•1 comments

Tell HN: My daily game won a Players Choice Award

19•paulhebert•1d ago•2 comments

Ask HN: How do we solve the bot flooding problem without destroying anonymity?

11•txrx0000•22h ago•16 comments

I built AI agents that do the grunt work solo founders hate

5•Seleci•1d ago•5 comments

Ask HN: Builder.ai ($1B Microsoft-backed AI company) who's lookin at the assets?

6•gamelock•1d ago•4 comments

Ask HN: Article to share with a technical manager about modern AI coding tools?

7•killmill•1d ago•4 comments

Garbage In, Garbage Out: The Degradation of Human Requirements in the LLM Era

6•waylake•1d ago•5 comments

I don't need AI to build me a new app. I need it to make Jira bearable

21•niel_hu•3d ago•21 comments

Super Editor – Atomic file editor with automatic backups (Python and Go)

6•larryste•2d ago•1 comments

Seeking Advice on Improving OCR for Watermarked PDFs in My RAG Pipeline

2•hundredtrillion•1d ago•2 comments

Ask HN: Who Is Using XMPP?

23•nunobrito•4d ago•11 comments

36yo: Career at home vs. Simple life abroad?

12•Slaboli•2d ago•34 comments

Ask HN: Why are some websites locking or using the audio device on Windows?

4•ezconnect•1d ago•1 comments

Ask HN: How do you handle duplicate side effects when jobs, workflows retry?

10•shineDaPoker•3d ago•11 comments

Ask HN: My competitor wants to buy us out, recommend a lawyer?

8•VladVladikoff•3d ago•8 comments

Ask HN: What's it like working in big tech recently with all the AI tools?

22•ex-aws-dude•4d ago•14 comments

LazyGravity – I made my phone control Antigravity so I never leave bed

9•masaTokyo•4d ago•5 comments

If you drive clock wise along the beach on an island

7•Cookingboy•2d ago•5 comments

Ask HN: Starting a New Role with Ada

10•NoNameHaveI•4d ago•5 comments

Ask HN: What will happen with Anthropics ultimatum?

7•maniacwhat•4d ago•4 comments

Ask HN: What Happened to HTTPS://Www.keyvalues.com/?

4•alexgotoi•1d ago•0 comments

I built a 151k-node GraphRAG swarm that autonomously invents SDG solutions

2•wisdomagi•3d ago•0 comments

You've reached the end!

Open in hackernews

Aura-State: Formally Verified LLM State Machine Compiler

3•rohanmunshi08•10h ago
I noticed a pattern: every LLM framework today lets the AI manage state and do math. Then we wonder why pipelines hallucinate numbers and break at 3 AM.

I took a different approach and built Aura-State, an open-source Python framework that compiles LLM workflows into formally verified state machines.

Instead of hoping the AI figures it out, I brought in real algorithms from hardware verification and statistical learning:

CTL Model Checking: the same technique used to verify flight control systems, now applied to LLM workflow graphs. Proves safety properties before execution.

Z3 Theorem Prover: every LLM extraction gets formally proven against business constraints. If the total ≠ price × quantity, Z3 catches it with a counterexample.

Conformal Prediction: distribution-free 95% confidence intervals on every extracted field. Not just "the LLM said $450k" but "95% CI: [$448k, $452k]."

MCTS Routing: Monte Carlo Tree Search (the algorithm behind AlphaGo) scores ambiguous state transitions mathematically.

Sandboxed Math: English math rules compile to Python AST. Zero hallucination calculations.

I ran a live benchmark against 10 real-estate sales transcripts using GPT-4o-mini: → 100% budget extraction accuracy ($0 mean error) → 20/20 Z3 proof obligations passed → 3/3 temporal safety properties proven → 65 automated tests passing

The gap between "it usually works" and "it provably works" is smaller than people think.

Would love feedback from anyone building production LLM systems; what would you want formally verified?

https://github.com/munshi007/Aura-State

Comments

aristofun•7h ago
Couple of simplified examples for us, mere mortals, not mathematicians would be nice.
ozozozd•7h ago
This is interesting and I would appreciate if you could elaborate with a few examples, and perhaps mention what inspired this, and/or what’s similar and/or analogous to your method.