Last year we tried to bring an LLM “agent” into a real enterprise workflow. It looked easy in the demo videos. In production it was… chaos.
• Tiny wording tweaks = totally different behaviour • Impossible to unit-test; every run was a new adventure • One mega-prompt meant one engineer could break the whole thing • SOC-2 reviewers hated the “no traceability” story
We wanted the predictability of a backend service and the flexibility of an LLM. So we built NOMOS: a step-based state-machine engine that wraps any LLM (OpenAI, Claude, local). Each state is explicit, testable, and independently ownable—think Git-friendly diff-able YAML.
Open-source core (MIT), today. • GitHub: https://github.com/dowhile/nomos • Documentation: https://nomos.dowhile.dev
Looking ahead: we’re also prototyping Kosmos, a “Vercel for AI agents” that can deploy NOMOS or other frameworks behind a single control plane. If that sounds useful, Join the waitlist for free paid membership for limited amount of people.
https://nomos.dowhile.dev/kosmos
Would love war stories from anyone who’s wrestled with flaky prompt agents. What hurt the most?