Show HN: CodevOS – Human-AI dev OS that shipped 106 PRs in 14 days on 80k LOC

https://cluesmith.com/blog/a-tour-of-codevos/

1•waleedk•1h ago

Comments

waleedk•1h ago

Hey HN, I'm Waleed. CodevOS is the system I've been building to explore a question: what happens when you stop thinking of AI as a coding assistant and instead think of the problem of having a human-AI joint software dev team?

The 106 PRs in 14 days was one person — me — with AI agents doing the implementation. The article walks through the ideas that make this work:

- Multi-model review: Three independent AI models (Claude, Gemini, Codex) review every phase. They catch different things — Codex finds security edge cases, Claude catches runtime semantics, Gemini catches architecture problems. No single model found more than 55% of the bugs.

- An agent that helps you organize agents. You work with an Architect agent that spawns Builder agents that work simultaneously in isolated git worktrees. While one is implementing a feature, another is fixing a bug, and you're reviewing a third's PR. Your job shifts from writing code to keeping the pipeline fed.

- Natural language is the source code. Specs, plans, and reviews are version-controlled in git alongside the source code — treated with the same rigor as the code itself. The AI's instructions live in the repo, not in someone's chat history that's already been compressed. You always know why something was built and how it was designed.

- Deterministic execution. Instead of asking the AI to follow a process and hoping it does, a state machine (Porch) enforces it. Human gates, build-verify loops, mandatory review phases. The AI can't skip steps, and if it exhausts its context window, the next agent picks up from the exact checkpoint.

- Annotation over editing. Most of the work is writing and reviewing these natural language documents — specs that define what to build, plans that define how. The documents guide the agents. You're directing, not coding.

- Whole lifecycle, git at the center. From idea through specification, planning, implementation, review, PR, and merge — the entire development lifecycle is managed. Git is the backbone: worktrees for isolation, branches for workflow, PRs for integration.

It's free and open source:

npm install -g @cluesmith/codev

(and https://github.com/cluesmith/codev for the code)

The article includes a controlled comparison against unstructured Claude Code — honest about the tradeoffs (costs more, takes longer, but catches more bugs and ships with tests).

I'm genuinely looking for feedback on this. What resonates? What doesn't? What would you do differently? This is still early and I want to hear what the HN community thinks. Happy to answer questions too.