frontpage.

Show HN: Batty – Run a team of AI coding agents in tmux with test gating

https://github.com/battysh/batty

1•Zedmor•1h ago

Hi HN, I'm the author.

I use Claude Code and Codex daily. Running one agent on a task works great. Running three or four in parallel on the same repo? They step on each other's files, nobody checks if the code compiles, and you spend more time coordinating than coding.

Batty is the supervisor layer I built to fix this. You define a team in YAML — an architect that plans work, a manager that dispatches it, engineers that execute. Batty launches each role in its own tmux pane, isolates engineer work in git worktrees, routes messages between roles, and gates task completion on passing tests.

The interesting part is what it's not: it's not an agent framework, and it doesn't embed any model. It orchestrates existing agent CLIs (Claude Code, Codex, Aider) using tmux as the runtime and git worktrees for isolation. Config is YAML, the kanban board is Markdown (powered by a bundled kanban-md tool), inboxes are Maildir, logs are JSONL. You can `git diff` your entire team state.

Built in Rust, published on crates.io (v0.1.0). The daemon is a synchronous 5-second poll loop — no async complexity. It watches pane output to detect idle/active/dead agents, reads Claude and Codex session files on disk to reduce false-positive idle detection, and uses a merge lock to serialize concurrent worktree merges.

Some things I learned running multi-agent setups:

- 3-5 parallel engineers is the sweet spot. Beyond that, the codebase itself becomes the bottleneck for absorbing parallel changes. - Task decomposition quality matters more than agent count. A good architect prompt outperforms throwing more engineers at bad tasks. - Test gating eliminated most of the chaos. Without it, agents "complete" work that breaks everything downstream. - You still need to supervise. It's not fire-and-forget — it's closer to managing a junior team. The leverage is supervising five workstreams instead of doing one.

I know there's prior art in this space — Tmux-IDE and vibe-kanban both approach multi-agent coordination differently. Batty is more opinionated about supervision: the test gating and communication constraints are first-class, not optional. Different tradeoffs for different workflows.

It's early (v0.1.0). The core loop is solid but the API is still settling. Eight built-in templates range from solo (1 agent) to large (19 agents with three management layers). The architecture diagram in the README shows the full supervision flow.

2-minute demo: https://youtube.com/watch?v=2wmBcUnq0vw Docs: https://battysh.github.io/batty

Happy to go deep on the architecture or the worktree strategy. For those running multiple agents: what's the biggest operational pain point?

Author of "Careless People" banned from saying anything negative about Meta

Yes, Therapy Sessions Are Being Used to Train AI

Getting to Know the Know-It-Alls: On a new history of pedantry

OpenAI executive shuffle includes new role for COO

New radio app in BETA –> Auralo

Weather Channel RetroCast Now Technical Breakdown

Landdown: Simple Sandboxing for Shell Scripts

Show HN: MyPDFBoy – Free PDF redaction that deletes the data

Who was the actual target of the axios supply chain attack?

'The frontline is like Terminator': fighting robots give Ukraine hope in war

Ice Age dice show early Native Americans may have understood probability

Boeing 777-9 Maximum Brake Energy Certification Testing [video]

Apple's Spotlight Search Results Come with Engagement Metrics. No One Knew

Show HN: A simple iOS app that helps you give yourself some time"

Lisette – Rust syntax, Go runtime

An Abject Horror

Show HN: I made open source, zero power PCB hackathon badges

The Indie Internet Index – submit your favorite sites

Microsoft's new framework for building and orchestrating AI agents

Autonomous Vulnerability Hunting with MCP

Re_terminal: Start your digital resistance – Geyser

Intelligence Cannot Be Trained?

Show HN: Semsei — AI SEO for clicks, not impressions

Wrote an Honest Comparison of Final Round AI vs. Pramp

New Advances Bring the Era of Quantum Computers Closer

Show HN: Tokencap – Token budget enforcement across your AI agents

No One at Waffle House Remembers FEMA Official Who Says He Teleported In

Using model-agnostic agentic annotation tools on locally stored files

Slop is content without grounding

Whoop Sues Bevel