Show HN: OmoiOS–190K lines of Python to stop babysitting AI agents (Apache 2.0)

2•kanddle•2h ago

AI coding agents generate decent code. The problem is everything around the code - checking progress, catching drift, deciding if it's actually done. I spent months trying to make autonomous agents work. The bottleneck was always me.

Attempt 1 - Claude/GPT directly: works for small stuff, but you re-explain context endlessly.

Attempt 2 - Copilot/Cursor: great autocomplete, still doing 95% of the thinking.

Attempt 3 - continuous agents: keeps working without prompting, but "no errors" doesn't mean "feature works."

Attempt 4 - parallel agents: faster wall-clock, but now you're manually reviewing even more output.

The common failure: nobody verifies whether the output satisfies the goal. That somebody was always me. So I automated that job.

OmoiOS is a spec-driven orchestration system. You describe a feature, and it:

1. Runs a multi-phase spec pipeline (Explore > Requirements > Design > Tasks) with LLM evaluators scoring each phase. Retry on failure, advance on pass. By the time agents code, requirements have machine-checkable acceptance criteria.

2. Spawns isolated cloud sandboxes per task. Your local env is untouched. Agents get ephemeral containers with full git access.

3. Validates continuously - a separate validator agent checks each task against acceptance criteria. Failures feed back for retry. No human in the loop between steps.

4. Discovers new work - validation can spawn new tasks when agents find missing edge cases. The task graph grows as agents learn.

What's hard (honest):

- Spec quality is the bottleneck. Vague spec = agents spinning. - Validation is domain-specific. API correctness is easy. UI quality is not. - Discovery branching can grow the task graph unexpectedly. - Sandbox overhead adds latency per task. Worth it, but a tradeoff. - Merging parallel branches with real conflicts is the hardest problem. - Guardian monitoring (per-agent trajectory analysis) has rough edges still.

Stack: Python/FastAPI, PostgreSQL+pgvector, Redis (~190K lines). Next.js 15 + React Flow (~83K lines TS). Claude Agent SDK + Daytona Cloud. 686 commits since Nov 2025, built solo. Apache 2.0.

I keep coming back to the same problem: structured spec generation that produces genuinely machine-checkable acceptance criteria. Has anyone found an approach that works for non-trivial features, or is this just fundamentally hard?

GitHub: https://github.com/kivo360/OmoiOS Live: https://omoios.dev

Comments

kanddle•2h ago

Creator here. TL;DR: OmoiOS takes a feature description, generates structured specs with acceptance criteria, dispatches agents to isolated cloud sandboxes, validates each task autonomously, and produces a PR. You review the PR, not every intermediate step.

The core insight: AI coding tools are great at generating code, but someone still has to verify the output matches the goal. Usually that someone is you. OmoiOS automates that oversight loop.

How this compares to what you're probably using:

- vs Claude Code / Cursor: great interactive tools where you're in the loop. OmoiOS is for when you want to write the spec, approve the plan, and walk away.

- vs Codex: both produce PRs, but Codex is prompt-driven (individual tasks). OmoiOS is spec-driven (full feature lifecycle). Also open-source and not locked to one provider.

- vs Kiro: both spec-driven, but Kiro is a VS Code fork for interactive work. OmoiOS runs autonomously in the cloud. Also open-source, self-hostable, multi-model.

- vs CrewAI / LangGraph: agent frameworks (primitives). OmoiOS is an opinionated system — full lifecycle from spec to PR.

- vs Devin: OmoiOS is open-source, self-hostable, shows you the plan before executing. Devin is a black box.

Built with Claude Agent SDK + FastAPI + PostgreSQL + Next.js 15. Apache 2.0 — fork it, self-host it, build on it.

Happy to go deep on the spec pipeline, the validation loop, or the multi-agent coordination.

genxy•42m ago

The pervasive use of AI to write posts makes them exhausting to read.

Show HN: Jido 2.0, Elixir Agent Framework

Show HN: PageAgent, A GUI agent that lives inside your web app

Show HN: Poppy – A simple app to stay intentional with relationships

Show HN: Tracemap – run and visualize traceroutes from probes around the world

Show HN: OmoiOS–190K lines of Python to stop babysitting AI agents (Apache 2.0)

Show HN: AgnosticUI – A source-first UI library built with Lit

Show HN: echo.html, between Feather Wiki and Roam with commands like Emacs

Show HN: Hormuz Crisis Dashboard Real-time shipping disruption tracker

Show HN: Keep large tool output out of LLM context: 3x accuracy 95% fewer tokens

Show HN: Stacked Game of Life

Show HN: Voice skill for AI agents – sub-200ms latency via native SIP

Show HN: SpiderSuite – Multi-engine web crawler and proxy for security research

Show HN: Vertex.js – A 1kloc SPA Framework

Show HN: A shell-native cd-compatible directory jumper using power-law frecency

Show HN: Rust compiler in PHP emitting x86-64 executables

Show HN: podcast-cli - A Rust CLI for Podcast Index & YouTube Subtitles

Show HN: DevTrack – A personal dashboard to track your developer growth

Show HN: Anaya – CLI that scans codebases for DPDP compliance violations

Show HN: I made a zero-copy coroutine tracer to find my scheduler's lost wakeups

Show HN: AlifZetta – AI Operating System That Runs LLMs Without GPUs

Show HN: PyMath Preview – preview LaTeX math in Python docstrings inside VS Code

Show HN: Your AI Slop Bores Me

Show HN: A GFM+GF-MathJax/Latex HTML formatting adventure

Show HN: Paste a URL and watch multiple AI models redesign it side-by-side

Show HN: I built a sub-500ms latency voice agent from scratch

Show HN: Timber – Ollama for classical ML models, 336x faster than Python

Show HN: Omni – Open-source workplace search and chat, built on Postgres

Show HN: Qlog – grep for logs, but 100x faster

Show HN: I put HN discussions next to the article where it belongs

Show HN: Fast Chladni figure simulation in Python with NumPy vectorization

Show HN: OmoiOS–190K lines of Python to stop babysitting AI agents (Apache 2.0)

Comments

Show HN: Jido 2.0, Elixir Agent Framework

Show HN: PageAgent, A GUI agent that lives inside your web app

Show HN: Poppy – A simple app to stay intentional with relationships

Show HN: Tracemap – run and visualize traceroutes from probes around the world

Show HN: OmoiOS–190K lines of Python to stop babysitting AI agents (Apache 2.0)

Show HN: AgnosticUI – A source-first UI library built with Lit

Show HN: echo.html, between Feather Wiki and Roam with commands like Emacs

Show HN: Hormuz Crisis Dashboard Real-time shipping disruption tracker

Show HN: Keep large tool output out of LLM context: 3x accuracy 95% fewer tokens

Show HN: Stacked Game of Life

Show HN: Voice skill for AI agents – sub-200ms latency via native SIP

Show HN: SpiderSuite – Multi-engine web crawler and proxy for security research

Show HN: Vertex.js – A 1kloc SPA Framework

Show HN: A shell-native cd-compatible directory jumper using power-law frecency

Show HN: Rust compiler in PHP emitting x86-64 executables

Show HN: podcast-cli - A Rust CLI for Podcast Index & YouTube Subtitles

Show HN: DevTrack – A personal dashboard to track your developer growth

Show HN: Anaya – CLI that scans codebases for DPDP compliance violations

Show HN: I made a zero-copy coroutine tracer to find my scheduler's lost wakeups

Show HN: AlifZetta – AI Operating System That Runs LLMs Without GPUs

Show HN: PyMath Preview – preview LaTeX math in Python docstrings inside VS Code

Show HN: Your AI Slop Bores Me

Show HN: A GFM+GF-MathJax/Latex HTML formatting adventure

Show HN: Paste a URL and watch multiple AI models redesign it side-by-side

Show HN: I built a sub-500ms latency voice agent from scratch

Show HN: Timber – Ollama for classical ML models, 336x faster than Python

Show HN: Omni – Open-source workplace search and chat, built on Postgres

Show HN: Qlog – grep for logs, but 100x faster

Show HN: I put HN discussions next to the article where it belongs

Show HN: Fast Chladni figure simulation in Python with NumPy vectorization