6 Practices that turned AI from prototyper to workhorse (106 PRs in 14 days)

13•waleedk•3h ago

1. Specs and plans are source code: Specs and plans live in git alongside source code, not in chat history. A new agent reads arch.md for the big picture, then its specific spec. You always know why something was built.

2. Three models review every phase: Claude, Gemini, and Codex catch almost entirely different bugs. No single model found more than 55% of issues. If you only review with the model that wrote the code, you're missing half the bugs. 20 bugs caught before shipping. Claude Code found 5 bugs, Gemini and Codex caught another 15, including a severe security issue Claude missed.

3. Enforce the process, don't suggest it. A state machine forces Spec → Plan → Implement → Review → PR. The AI can't skip steps. Tests must pass before advancing. AIs don't stick to the plan by themselves, you need rails.

4. Annotate, don't edit. Most of the work is writing specs and reviews that guide the code, not hacking at files in an open-ended chat.

5. Agents coordinate agents. An architect agent spawns builder agents into isolated git worktrees. You direct the architect; it directs the builders. They message each other async.

6. Manage the whole lifecycle. Most AI tools help you write code faster — maybe 30% of the job. The other 70% is planning how, reviewing, integrating, deployment scripts, managing staging vs prod. Have AI run the whole pipeline from spec to PR and beyond.

Overall result: One engineer able to produce what a team of 3-4 would usually do. Measured 1.2 points better code on a 10 point scale vs claude code. Downsides: takes a lot longer, much more token usage, but still reasonable at $1.60 per PR.

We open sourced it: https://github.com/cluesmith/codev More details and raw results: https://cluesmith.com/blog/a-tour-of-codevos/

Comments

waleedk•3h ago

Happy to answer any questions. Here are those links as clickables:

Github: https://github.com/cluesmith/codev Tour + raw results: https://cluesmith.com/blog/a-tour-of-codevos/

trollbridge•1h ago

This original post looks AI-generated.

Could you share the prompts you used to generate it?

waleedk•56m ago

In a sense? This human built a system for AI to build stuff then asked the AI to summarize what the AI that built the human built?

It was more of a conversation, but it was like: Hey I wrote these 6 points about what we're doing differently, please tailor them to be most useful to an HN audience.

skydhash•1h ago

> Codev isn’t an AI model. It’s not a coding assistant. It’s not a VS Code extension. It’s a set of CLI tools, protocols, and infrastructure that orchestrates existing AI coding tools (Claude Code, Gemini CLI, OpenAI’s Codex CLI) into a structured workflow.

Thanks for the clarification, I couldn't have guessed otherwise.

waleedk•54m ago

Useful criticism -- what could I have done to help you get that message sooner?

ddoottddoott•1h ago

Would you rather fight 100 AI workhorses or 1 workhorse AI?

waleedk•57m ago

Ha! I would rather fight 100 workhorse AIs with an Architect + Builder AIs on my side :-).

Seriously, the agents managing agents thing works so well. When I'm working, I'll sometimes have 6 builder agents fixing different bugs, and I will lose state and I rely on the architect agent who doesn't have stupid limitations like 7 +/- 2 things in working memory.

yodon•1h ago

I'm a huge fan of spec-kit, and am actively looking for a replacement for it because spec-kit is no longer maintained by the team at GitHub.

Codev looks like it has a lot of good similarities to spec-kit, and like it's something I need to pay close attention to. That said, I'll encourage you to do another pass on your command names, intros, and cheat-sheet.

I suspect most developers using codev will mostly use a very small fraction of the codev commands most of the time, similar to the way spec-kit is mostly /specify, /plan, /tasks, and /implement, with a bit of /clarify and /analyze once you really get comfortable with it. If I'm right, having some docs where you emphasize the simplicity of your core flow would be very helpful.

For calibration, five minutes into reading your home page and medium post and some of your repo docs, I'm ready to believe this is true, but I have no idea what that core flow is or looks like. Five minutes is actually a pretty long time, and I suspect most visitors will end up bouncing if they don't get clarity on what the experience is ultimately going to be like for them in five minutes (or, more likely, much less than five minutes).

waleedk•1h ago

Yes, this is spec kit on steroids. In particular specs + protocol enforcement works _really_ well. The protocol enforcement is the game changer: I would find the AI just wouldn't stick to specs or plans.

Great suggestions. I will do that. Did you notice any specific issues in those?

Got it about the core flow. Appreciate it. I plan to record a video showing how to kick off a new project and another one showing how to use it in maintenance mode. Would that be helpful?

@yodon if you would like to reach out to me at hello@cluesmith.com I'd love to get your feedback once those assets are ready.

Ghostty – Terminal Emulator

Microgpt

AWS Middle East Central Down, apparently struck in war

Why XML Tags Are So Fundamental to Claude

A new Polymarket account made over $500k betting on the U.S. strike against Iran

Microgpt explained interactively

Decision trees – the unreasonable power of nested decision rules

We do not think Anthropic should be designated as a supply chain risk

Python Type Checker Comparison: Empty Container Inference

Flightradar24 for Ships

How Dada Enables Internal References

I built a demo of what AI chat will look like when it's "free" and ad-supported

Interview with Øyvind Kolås, GIMP developer (2017)

Lil' Fun Langs' Guts

When does MCP make sense vs CLI?

Show HN: Audio Toolkit for Agents

10-202: Introduction to Modern AI (CMU)

New iron nanomaterial wipes out cancer cells without harming healthy tissue

Aromatic 5-silicon rings synthesized at last

The real cost of random I/O

Switch to Claude without starting over

Gzpeek: Tool to Parse Gzip Metadata

Why is the first C++ (m)allocation always 72 KB?

An ode to houseplant programming (2025)

Obsidian Sync now has a headless client

January in Servo: preloads, better forms, details styling, and more

Rydberg atoms detect clear signals from a handheld radio

Robust and efficient quantum-safe HTTPS

The happiest I've ever been

Show HN: Vertex.js – A 1kloc SPA Framework