frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Statewright – Visual state machines that make AI agents reliable

https://github.com/statewright/statewright
3•azurewraith•1h ago
Agentic problem solving in its current state is very brittle. I fell in love with it, but it creates as many problems as it solves.

I'm Ben Cochran, I spent 20+ years in the trenches with full-stack Engineering, DevOps, high performance computing & ML with stints at NVIDIA, AMD and various other organizations most recently as a Distinguished Engineer.

For agents to work reliably you either need massive parameter counts or massive context windows to keep the solution spaces workable. Most people are brute forcing reliability with bigger models and longer prompts.

What if I made the problem smaller instead of making the model bigger?

I took a different approach by using smaller models: models in the 13-20B parameter range and set them to task solving real SWE-bench problems. I constrained the tool and solution spaces using formal state machines. Each state in the machine defines which tools the model can access, how many iterations it gets and what transitions are valid. A planning state gets read-only tools. An implementation state gets edit tools (scoped to prevent mega edits) and write friendly bash tools. The testing state gets bash but only for testing commands. The model cannot physically skip steps or use the wrong tool at the wrong time. It is enforced via protocol, not via prompts.

The results were more promising than I would have expected. Across multiple model families irrespective of age (qwen-coder, gpt-oss, gemma4) and the improvements were consistent above the 13B parameter inflection point. Below that, models can navigate the state machine but can't retain enough context to produce accurate edits. More on the research bit: https://statewright.ai/research

Surprisingly this yielded improvements in frontier models as well. Haiku and Sonnet start to punch above their weight and Opus solves more reliably with fewer tokens and death spirals. Fine tuning did not yield these kinds of functional improvements for me. The takeaway it seems is that context window utilization matters more than raw context size - a tightly scoped working context at each step outperforms a model given carte blanche over everything. Constraining LLMs which are non-idempotent by using deterministic code is a pattern that nobody is currently talking about.

So, I built Statewright. Its core is a Rust engine that evaluates state machine definitions: states, transitions, guards and tool restrictions. Its orchestration doesn't use an LLM, just enforces the state machine. On top of that is a plugin layer that integrates with Claude Code (and soon Codex, Cursor and others) via MCP. When you activate a workflow, hooks enforce the guardrails per state automatically. The model sees 5 tools available instead of dozens, gets clear instructions for the current phase and transitions when conditions are met. Importantly it tells the model when it's attempting to do something that isn't in scope, incorrect or when it needs to try something else after getting stuck.

You can use your agent via MCP to build a state machine for you to solve a problem in your current context. The visual editor at statewright.ai lets you tweak these workflows in a graph view... You can clearly see the failure paths, the retry loops and the approval gates. State machines aren't DAGs; they loop and retry, which is what agentic work actually needs.

Statewright is currently live with a free tier, try it out in Claude Code by running the following:

/plugin marketplace add statewright/statewright

/plugin install statewright

/reload-plugins

Then "start the bugfix workflow" or /statewright start bugfix. You'll need to paste your API key when prompted. The latest versions of Claude may complain -- paste the API key again and say you really mean it, Claude is just being cautious here.

Feedback is welcome on the workflow editor, the plugin experience, and tell me what workflows you'd want to build first. Agents are suggestions, states are laws.

Show HN: Crane Control

https://xkqr.org/cranecontrol/
1•kqr•27s ago•0 comments

Evento – Events Made Social

https://evento.so/
1•janandonly•1m ago•0 comments

Graphmind – local code intelligence for Claude Code(graph and mem and MCP)

https://github.com/aouicher/graphmind
1•aouicher•1m ago•0 comments

AI Floss

https://actsofvolition.com/2026/05/ai-floss/
1•speckx•1m ago•0 comments

Static Analysis for GitHub Actions

https://github.com/zizmorcore/zizmor
1•SEJeff•2m ago•1 comments

Exposing a $300M Private Equity Scam [video]

https://www.youtube.com/watch?v=04FRAsU0za0
1•Imustaskforhelp•3m ago•0 comments

I built a privacy-focused tool to help people understand complex documents

https://www.understanddocs.com
1•Nencheff•3m ago•0 comments

Ranking a What Is My IP Tool

https://timleland.com/ranking-what-is-my-ip-tool/
2•TimLeland•4m ago•0 comments

Wrap Go binaries in Python wheels

https://github.com/simonw/go-to-wheel
1•ankitg12•4m ago•0 comments

Show HN: X509-certificate-exporter – Prometheus exporter for TLS cert expiration

https://github.com/enix/x509-certificate-exporter
2•solvik•5m ago•0 comments

Setting the record straight on Cloud Access and Community

https://blog.bambulab.com/setting-the-record-straight-on-cloud-access-and-community/
2•Topfi•5m ago•0 comments

All the ways to mock your Rust code

https://blog.appliedcomputing.io/p/all-the-ways-to-mock-your-rust-code
3•avenger337•8m ago•0 comments

Show HN: Reducing LLM input tokens by 70%

https://adola.app/
5•Jbunga•10m ago•2 comments

Europe could soon get new platform to book train tickets

https://nltimes.nl/2026/05/12/europe-soon-get-new-platform-book-train-tickets
2•robtherobber•10m ago•0 comments

The NY Times Published an A.I.-Fabricated Quote Attributed to Pierre Poilievre

https://pxlnv.com/linklog/times-poilievre-fabricated-quote/
2•latexr•10m ago•0 comments

Multilingual Ambiguity

https://blog.ptidej.net/multilingual-ambiguity/
2•luca-sctr•11m ago•0 comments

Why Not Objective-C

https://inessential.com/2026/02/18/why-not-objective-c.html
2•surprisetalk•12m ago•0 comments

Chemistry in the AI Era

https://www.nature.com/articles/d41586-026-01521-9
2•Brajeshwar•12m ago•0 comments

There is a problem with users abusing flagging on HN (2025)

https://twitter.com/paulg/status/1907528478855201096
2•washingupliquid•13m ago•0 comments

Want to AI proof your degree? Study History

https://froginawell.net/frog/2026/05/want-to-ai-proof-your-degree-study-history/
2•speckx•14m ago•0 comments

Roadside Picnic and the AI Race

https://readgrounded.com/episodes/001-golden-sphere/
2•readgrounded•14m ago•0 comments

'systematic' rape and sexual violence during Hamas' Oct 7 attack on Israel

https://www.cnn.com/2026/05/12/middleeast/report-sexual-violence-hamas-oct-7-attacks-intl
1•Tomte•14m ago•0 comments

Operation: Epic Furious

https://www.epicfurious.com/
1•dmschulman•15m ago•0 comments

Ask HN: Any materials on building distributed rate limiter?

2•ravshan•16m ago•0 comments

"Cannot be explained" – New ultra stainless steel stuns researchers

https://www.sciencedaily.com/releases/2026/05/260510030950.htm
2•bilsbie•17m ago•0 comments

South Korea's housing crisis explained (2025)

https://lgiu.org/south-koreas-housing-crisis-explained/
1•thelastgallon•17m ago•0 comments

Stochastic Parrots: Frequently Unasked Questions

https://medium.com/@emilymenonbender/stochastic-parrots-frequently-unasked-questions-49c2e7d22d11
1•cratermoon•17m ago•0 comments

Bioplastics Toxicity Upon Ingestion: Biotransformation and GI Effects

https://www.mdpi.com/2073-4360/18/9/1091
1•PaulHoule•17m ago•0 comments

Why senior developers fail to communicate their expertise

https://www.nair.sh/guides-and-opinions/communicating-your-expertise/why-senior-developers-fail-t...
1•nilirl•19m ago•1 comments

Apple Sales Coach Will Use AI-Generated Video Presenters

https://www.macrumors.com/2026/05/12/apple-sales-coach-will-use-ai-generated-presenters/
1•ndr42•20m ago•0 comments