frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: claude-autopilot, autonomous dev pipeline with multi-model review

https://github.com/axledbetter/claude-autopilot
3•axledbetter01•8h ago

Comments

axledbetter01•8h ago
claude-autopilot is an MIT-licensed npm package that runs Claude Code through an autonomous pipeline: brainstorm, spec, plan, implement, migrate, validate, PR, review, bugbot. Point it at an idea, walk away, come back to a PR that's review-ready. Merge stays human-gated by default.

  Try it in 30 seconds:                                               

    npm install -g @delegance/claude-autopilot                                                     
    claude-autopilot examples              # list 5 starter stacks
    claude-autopilot examples node > spec.md                                                       
    claude-autopilot autopilot spec.md     # ship it                                               
   
  Five bundled stack templates (node, python, fastapi, go-cli, rust-cli) so you don't write your   
  first spec from a blank page.                                       
                                                                                                   
  The strongest credibility signal I can give you: claude-autopilot built itself. Every version of 
  this project that ever shipped, including v7.10.1 today, went through the pipeline you'll see on
  GitHub. Spec, plan, implementation subagents, Codex review, bugbot triage, admin-merge, npm      
  publish. Full commit history and review threads preserved on the repo. No marketing, just the
  receipts.

  I also use it daily on a production codebase. Several hundred thousand lines of code merged per  
  week sustained, with one week peaking over a million. That's gross churn across feature code,
  tests, types, and migrations, mostly via the autopilot pipeline. The CLI is solving real problems
   for me before it ships to anyone else.                             

  What's actually distinctive:

  1. Multi-model role split, by default. Claude writes code, Codex reviews the plan and the diff,  
  Cursor bugbot triages PR findings. Each model gets the job it's actually best at. Sequential by
  default. Opt-in parallel council (claude-autopilot council) dispatches the same prompt to Claude 
  + Codex + Gemini and synthesizes consensus.                         

  2. Every phase is an editable markdown skill. Not a black-box pipeline.                          
  .claude/skills/autopilot/SKILL.md is plain markdown you can read in 5 minutes, audit, edit, swap
  any phase. The risk-tiered review policy (1/2/3 Codex passes by spec risk frontmatter,           
  auto-escalated for auth, multi-tenancy, billing, secrets, migrations, RLS, IAM) lives there as
  plain instructions. Inspectability is the wedge against Devin and Cursor agent mode.

  3. Local CLI, your provider keys. Anthropic, OpenAI, Google, Groq, Ollama-local. The             
  orchestration runs on your machine. Prompts go to whichever models you've configured. For pure
  local-only you need Claude Code itself on a local provider; for most teams the goal is "no hosted
   orchestration plus existing keys."                                 

  Benchmark on a Next.js fixture seeded with 13 production-realistic bugs (SQL injection, missing  
  auth, IDOR, SSRF, open redirect, TOCTOU race, console.log in prod, missing input validation,
  etc): scan caught 13/13 in 38 seconds for $0.21. Fixture and reproduction in the repo.           
                                                                      
  Links:
  https://www.npmjs.com/package/@delegance/claude-autopilot
  https://github.com/axledbetter/claude-autopilot                                                  
   
  I'm Alex, founding eng at Delegance (insurance brokerage platform). Built claude-autopilot for my
   own internal use, open-sourced when it started shipping itself.
axledbetter01•8h ago
A few framing notes since the body had to fit under 4,000 chars:

  On the volume number. "Several hundred thousand LOC per week sustained, with peaks over a        
  million" is gross churn (insertions + deletions) across feature code, tests, generated types,
  lockfile updates, and migrations. Net new shippable code is a smaller fraction. The point isn't  
  raw LOC; it's that the pipeline can sustainably operate on a real production codebase at that
  throughput, not a toy.

  On stacks supported. The pipeline orchestrates whatever your project uses. Migration adapters    
  cover Rails (Active Record), Alembic, Django, Prisma, Drizzle, golang-migrate, dbmate, flyway,
  supabase-cli, ecto, and typeorm; falls back to a configurable shell command for anything else.   
  Deploy adapters cover Vercel, Fly, Render, and a generic shell adapter. Validate runs whatever
  test/lint/typecheck command you configure (npm test, pytest, go test, anything). Monorepo support
   auto-detects npm/yarn/pnpm workspaces, Turborepo, and Nx. Review engine adapters cover Claude,
  Gemini, Codex, and any OpenAI-compatible endpoint (Groq, Ollama, Together).

  Why this vs Devin or Cursor agent mode. Devin is hosted, opaque, per-ACU billed, single-vendor   
  stack. claude-autopilot runs locally, every phase is an editable skill, you bring your own
  provider keys, MIT-licensed. Cursor agent mode is a single-shot in-IDE loop. claude-autopilot    
  sits one layer higher: spec review, implementation dispatch, validation, PR review, release
  workflow, retry-loop progress detection.

  Closest cousins. Aider, OpenHands, SWE-agent. We share the local-CLI plus user's-key philosophy  
  and add the phase pipeline, multi-model role split, risk-tiered review, and the retry-loop
  sameness detector (halts the pipeline when retries make no progress instead of burning the retry 
  budget on attempts going nowhere).                                  

  See it work, with numbers:
  - DEMO.md walks through one autonomous run, 12 minutes wall clock, $2.20 spend, 5 new tests:
    https://github.com/axledbetter/claude-autopilot/blob/master/DEMO.md                            
  - Benchmark: 13/13 production-realistic bugs caught in 38 seconds for $0.21, reproducible:       
    https://github.com/axledbetter/claude-autopilot#benchmark                                      
                                                                                                   
  Happy to dig into any of it.

Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks

https://github.com/antoinezambelli/forge
448•zambelli•20h ago•171 comments

Show HN: Superlog (YC P26) – Observability that installs itself and fixes bugs

https://superlog.sh/
65•Magnanten•16h ago•43 comments

Show HN: Gaussian Splat of a Strawberry

https://superspl.at/scene/84df8849
497•danybittel•21h ago•190 comments

Show HN: I made a 3D pose maker for artists

https://setpose.com/
78•augustvdv•18h ago•30 comments

Show HN: Haystack – Review the PRs that need human attention

https://haystackeditor.com/
35•akshaysg•1d ago•11 comments

Show HN: Number Gacha, a gacha game distilled to its essence

https://isabisabel.com/gacha/
243•babel16•6d ago•123 comments

Show HN: Yt-x v0.8.0 – Browse, play, and download YouTube from the terminal

https://github.com/Benexl/yt-x
20•Benex254•12h ago•2 comments

Show HN: Pg_deltax, Apache-licensed alternative to TimescaleDB

https://github.com/xataio/deltax
30•tee-es-gee•13h ago•1 comments

Show HN: Id-agent – Token efficient UUID alternative for AI agents

https://github.com/vostride/id-agent
36•pranshuchittora•21h ago•52 comments

Show HN: Files.md – Open-source alternative to Obsidian

https://github.com/zakirullin/files.md
697•zakirullin•1d ago•340 comments

Show HN: Javalamp – A glowing terminal screensaver that keeps your Mac awake

https://github.com/breschio/javalamp
2•tbreschi•5h ago•1 comments

Show HN: Hsrs – Type-Safe Haskell Bindings Generator for Rust

https://github.com/harmont-dev/hsrs
52•suis_siva•1d ago•6 comments

Show HN: InsForge – Open-source Heroku for coding agents

https://github.com/InsForge/InsForge
55•mrcoldbrew•1d ago•6 comments

Show HN: The user agents crawling HN today

https://ai.realhackers.org/user_agents.txt
4•Bender•8h ago•1 comments

Show HN: claude-autopilot, autonomous dev pipeline with multi-model review

https://github.com/axledbetter/claude-autopilot
3•axledbetter01•8h ago•2 comments

Show HN: I built a native macOS Markdown viewer 100% with AI coding agents

https://github.com/rajatarya/mdviewer
5•rajatarya•10h ago•1 comments

Show HN: SharpSkill – A LeetCode Alternative with real interview outcomes

https://sharpskill.dev/en/vs/leetcode
4•GiornoJojo•10h ago•0 comments

Show HN: LibreOffice-rs – I built a pure-Rust LibreOffice using autoresearch

https://github.com/clark-labs-inc/libreoffice-rs
6•stan_kirdey•15h ago•0 comments

Show HN: Enforra – open-source action governance for AI agent tool calls

https://github.com/enforra/enforra
4•rohitguptap•11h ago•1 comments

Show HN: Semble – Code search for agents that uses 98% fewer tokens than grep

https://github.com/MinishLab/semble
439•Bibabomas•2d ago•147 comments

Show HN: Bevel – Guess the book from its opening passage

https://bevel.ink
3•knotalegend•12h ago•1 comments

Show HN: DDS Vibe Academy – 31 free AI coding masterclasses, built by AI agents

2•robert_ddsbos•12h ago•0 comments

Show HN: Logbox – let Claude monitor your dev logs

https://github.com/struct-dot-ai/logbox
4•nimeshmc•13h ago•1 comments

Show HN: Search 67K .AI domains by AI-extracted tags and descriptions

https://ratemyaisite.com/explore
3•prolly97•13h ago•0 comments

Show HN: Mezz, a curl-able WiFi sandbox for IoT pentesting

https://github.com/ABGEO/mezz
39•ABGEO•4d ago•10 comments

Show HN: Rocksky – Music scrobbling and discovery on the AT Protocol

https://tangled.org/rocksky.app/rocksky
117•tsiry•3d ago•44 comments

Show HN: Clark-Browser – Stealth Chromium

https://github.com/clark-labs-inc/clark-browser
14•stan_kirdey•1d ago•4 comments

Show HN: How Expensive Is Your (Steam) Wishlist?

https://weloveit.io/how-expensive-is-your-wishlist/
3•dejobaan•15h ago•0 comments

Show HN: audio.observer – AI news jingles you didn’t ask for

https://audio.observer/
3•ugnju•15h ago•0 comments

Show HN: Auto-identity-remove – Automated data broker opt-out runner for macOS

https://github.com/stephenlthorn/auto-identity-remove
323•stephenlthorn•1d ago•134 comments