I got 3 parallel agents to change 149 files with 17 errors instead of 500

1•mvgnus•2w ago

So I have been coding with agents for what has been way too long at this point and ultimately you always get to a point where your coding agent will just cast any, make up new things, aka write slop.

The actual code for this is mostly what I experiment with to basically scale this up - but if you prompt your agent right you can literally use it as a simple prompt in your repo today, I personally use it in Antigravity as a workflow:

Løp Workflow: [truncated 10K chars, look in Biolab Repo]

So… How can I use it?

Step 1: Open your repo, I personally advice using a capable model for this step since I have seen a lot of laziness from GLM 4.7 and I assume smaller models will behave similar - but to put it bluntly: Send this workflow to Opus tell it to define „Slices“ for your codebase and write the /specs for each, tell it to continue until its finished

Step 2: Choose a spec and send this prompt: @your workflow file / your workflow context whatever

Lets do a digestable slice of improvements and code to spec alignment! - Review the projects.spec - Review all related components comprehensively assessing the current implementation and code based on REAL code reads - based on the spec sheet and the code - compare the both and write a implementation plan to address uncovered gaps functionality wise or otherwise and compile refinement/improvement/nexxttasks - Review the newly added code, test compilation / no new errors and update the spec to reflect the latest REAL code state, report to me how well it meets the Specs and line out next steps Keep up: spec <-> code? Review the spec, REVIEW all related code. Keep both in sync. Please fill all identifyable gaps and address tasks.

Step 3: You basically just loop until the Specs and Code align. You will notice that the agent will tell you „the spec and code are aligned“ instead of engineering the F* out of your code

Step 4: You know have functional slices of your codebase and you can now take the entirety of your specs (its not that much) -> send it to an SOTA LLM -> „What gaps are in my Spec“

Step 5: Take the Gap and fill it, I use this prompt:

[Put the task here]

- Review the relevant spec in /frontend/specs/ and - Review all related components comprehensively assessing the current implementation and code based on REAL code reads - based on the spec sheet and the code - assess the validity of the task and formulate an implementation plan - Review the newly added code, test compilation / no new errors and update the spec to reflect the latest REAL code state, report to me how well it meets the Specs and line out next steps Keep up: spec <-> code? Review the spec, REVIEW all related code. Keep both in sync. Please fill all identifyable gaps and address tasks.

The aftermath

So you are probably familiar with let me implement this one thing/refactor this/add these features and you end up grinding through 500 type issues until you get a somewhat working codebase again? This is what I get:

The Numbers

| Metric | Value | |--------|-------| | Parallel agents | 3 | | Files changed | 149 | | Lines added | +3,014 | | Lines removed | -2,881 | | Domain specs in repo | 47 | | Conflicts | 0 | | Agent communication | 0 | | Orchestration code | 0 lines |

Changes by Directory

frontend/server: +1,301 -640 frontend/app: +1,269 -1,687 frontend/specs: +416 -471

npx tsc: Found 17 Errors in 6 files

Repo (WIP) I am using this on (I have only started applying this pattern ~2 days ago)

https://github.com/Mvgnu/BioLabs

Does it scale So far I have yet to find the limit. If your code does not work you likely only need more loops against the spec. This also works in Claude Assistant Chat ironically - which produced the Løp. repo code

Comments

KellyCriterion•2w ago

so how many agents do you have to spin up to get 0 errors? :-)

Show HN: Minecraft Creeper meets 90s Tamagotchi

Show HN: Termiteam – Control center for multiple AI agent terminals

The only U.S. particle collider shuts down

Ask HN: Why do purchased B2B email lists still have such poor deliverability?

Show HN: Remotion directory (videos and prompts)

Portable C Compiler

Show HN: Kokki – A "Dual-Core" System Prompt to Reduce LLM Hallucinations

Software Engineering Transformation 2026

Microsoft purges Win11 printer drivers, devices on borrowed time

Lunch with the FT: Tarek Mansour

Old Mexico and her lost provinces (1883)

'AI' is a dick move, redux

The source code was the moat. But not anymore

Does anyone else feel like their inbox has become their job?

An AI model that can read and diagnose a brain MRI in seconds

Dev with 5 of experience switched to Rails, what should I be careful about?

AlphaFace: High Fidelity and Real-Time Face Swapper Robust to Facial Pose

Scientists discover “levitating” time crystals that you can hold in your hand

Rammstein – Deutschland (C64 Cover, Real SID, 8-bit – 2019) [video]

Tell HN: Yet Another Round of Zendesk Spam

Postgres Message Queue (PGMQ)

Show HN: Django-rclone: Database and media backups for Django, powered by rclone

NY lawmakers proposed statewide data center moratorium

OpenClaw AI chatbots are running amok – these scientists are listening in

Show HN: AI agent forgets user preferences every session. This fixes it

Introduce the Vouch/Denouncement Contribution Model

Show HN: SSHcode – Always-On Claude Code/OpenCode over Tailscale and Hetzner

Microsoft appointed a quality czar. He has no direct reports and no budget

Multi-agent coordination on Claude Code: 8 production pain points and patterns

Washington Post CEO Will Lewis Steps Down After Stormy Tenure