LLM codegen go brrr – Parallelization with Git worktrees and tmux

https://www.skeptrune.com/posts/git-worktrees-agents-and-tmux/

156•skeptrune•6mo ago

Comments

juancn•6mo ago

Now you can do 4X more code reviews!

arguflow•6mo ago

Why review the code? Most of the time you just want is a good starting point.

oparin10•6mo ago

If all you need is a good starting point, why not just use a framework or library?

Popular libraries/frameworks that have been around for years and have hundreds of real engineers contributing, documenting issues, and fixing bugs are pretty much guaranteed to have code that is orders of magnitude better than something that can contain subtle bugs and that they will have to maintain themselves if something breaks.

In this very same post, the user mentions building a component library called Astrobits. Following the link they posted for the library’s website, we find that the goal is to have a "neo-brutalist" pixelated 8-bit look using Astro as the main frontend framework.

This goal would be easily accomplished by just using a library like ShadCN, which also supports Astro[1], and has you install components by vendoring their fully accessibility-optimized components into your own codebase. They could then change the styles to match the desired look.

Even better, they could simply use the existing 8-bit styled ShadCN components[2] that already follow their UI design goal.

[1] - https://ui.shadcn.com/docs/installation/astro [2] - https://www.8bitcn.com/

arguflow•6mo ago

Frameworks and libraries are useful to keep the code style the same.

Using multiple agents helps when the endgoal isn't seen. Especially if there is no end state UI design in mind. I've been using a similar method for shopify polaris[1] putting the building blocks together (and combing through docs to find the correct blocks) is still a massive chore.

[1] - https://polaris-react.shopify.com/getting-started

skeptrune•6mo ago

I think AI makes personal software possible in a way that it wasn't before. Without LLMs, I would have never had the time to build a component library at all and would have probably used 8bitcn (looks awesome btw) and added the neo-brutalist shadows I wanted.

However, despite my gripes with ShadCN for Astro being minor (lots of deps + required client:load template directive), just small friction points are enough that I'm willing to quickly build my own project. AI makes it barely any work, especially when I lower the variance using parallelization.

eikenberry•6mo ago

> If all you need is a good starting point, why not just use a framework or library?

A good starting point fixes the blank page problem. Frameworks or libraries don't address this problem.

vidyootsenthil•6mo ago

or also 4x productivity!

juancn•6mo ago

Coding has never been for me the bottleneck, it's all the other crap that takes time.

morkalork•6mo ago

What's the issue, everyone loves doing code review right?

thepablohansen•6mo ago

This resonates- my engineering workflow has started shifting from highly focused, long periods of building out a feature to one that has much more context switching, review, and testing.

asadm•6mo ago

ooh i was exploring this path, aider is so slow. thanks for validating it.

arguflow•6mo ago

Is aider supposed to do worktrees by default?

asadm•6mo ago

i dont think so?

asadm•6mo ago

i dont see uzi code on github.

skeptrune•6mo ago

Hadn't pushed from our remotes. There now![1]

- [1] https://github.com/devflowinc/uzi

uludag•6mo ago

I completely see the benefit for this strategy. Defaulting to something like this would seem to inflate costs though, as a tradeoff for time. I know certain LLM usages can be pretty pricy. I hope that something like this doesn't become the default though as I can see parallelization being a dark pattern for those making money off of token usage.

ramoz•6mo ago

I don’t think it’s a great representation on of the utility of worktrees or even efficient practical use of agents.

vFunct•6mo ago

It pretty much is though. This is exactly what you'd do if you had 100 different employees.

ramoz•6mo ago

I wouldn't ask for 100 different versions of the same feature from each of them.

1 agent is supposed to be powerful with proper context and engineering design decisions in mind - whether UI or backend.

Asking 3 different agents to do the same engineering task reeks of inefficient or ineffective development patterns with agents.

TeMPOraL•6mo ago

> I wouldn't ask for 100 different versions of the same feature from each of them.

You wouldn't because human labor is too expensive to make it worthwhile. You would if it were too cheap to meter.

We actually do that at the scale of society - that's market competition in a nutshell. Lots of people building variants of the same things, then battling it out on the market. Yes, it's wasteful as hell (something too rarely talked about), but we don't have a better practical alternative at this point, so there's some merit to the general idea.

(Also the same principle applies to all life - both in terms of how it evolves, and how parts of living organisms work internally. Actively maintained equilibria abound.)

maxbond•6mo ago

> Also the same principle applies to all life

Actively maintained equilibria abound, but this is not typically the mechanism. Different species in adjacent niches aren't better or worse versions of the same organism to be evaluated and either selected or discarded. It's more typical for them to adopt a strategy of ecological segmentation so that they can all have their needs met. Every few years moths migrate to my state to reproduce - and they do so before our local moths have woken up for the season, and leave around the time they do, so that they aren't in competition. Birds that feed from the same trees will eat from different parts of the tree and mate at different times, so that their peak energy consumption doesn't line up. What would the benefit be in driving each other to extinction?

Evolution doesn't make value judgments, it doesn't know which species is better or worse and it doesn't know how future climactic shifts will change the fitness landscape. Segmentation is both easier and a hedge against future climactic shifts.

Engineering works under a very different logic where the goal is optimal performance in a controlled environment for an acceptable service life, not satisfactory performance with extremely high robustness in the face of unknown changes into the perpetual future. When we rank different systems and select the most optimal, we are designing a system that is extremely brittle on geologic timescales. Abandon a structure and it will quickly fall apart. But we don't care because we're not operating at geologic timescales and we expect to be around to redesign systems as their environment changes to make them unsuitable.

Similarly, the reproduction of labor/capacity in markets you described could be viewed as trading efficiency for robustness instead of as waste. Eg, heavily optimized supply chains are great for costs, but can have trouble adapting to global pandemics, wars in inconvenient places, or ships getting stuck in the wrong canal.

vFunct•6mo ago

I actually don’t use them that way. I use 100 different agents on 100 different worktrees to develop 100 different apps for the overall project.

ramoz•6mo ago

That’s what I’m advocating for. That’s not what was demonstrated in the blog

tough•6mo ago

in frontend exploratory random work might have some value if you dont know what you need.

both seem valid uses of this synthetic intelligence to me

tough•6mo ago

what if you have 100 lint errors that you can parallelize fixing to 100 small local 1B llms

ramoz•6mo ago

This is exactly what I would do.

ukuina•6mo ago

Without agent collaboration, you'll need a whole tree of agents just to resolve the merge conflicts.

tough•6mo ago

usually the orchestrator or planner that spawns the sub-agents is the -collaboration- protocol as it has visibility to all others and can start/kill new ones at wish as it sees fit and coordinate appropiately but yea

greymalik•6mo ago

This is discussed in TFA. The absolute costs are negligible, particularly in comparison to the time saved.

arguflow•6mo ago

I think the most annoying part is when a coding agent takes a particularly long time to produce something. AND has bad output, it is such a time sink / sunk cost

vercantez•6mo ago

Very cool! Actually practical use of scaling parallel test time compute. I've been using worktrees + agents to work on separate features but never considered allocating N agents per task.

maximilianroos•6mo ago

I posted some notes from a full setup I've built for myself with worktrees: https://github.com/anthropics/claude-code/issues/1052

I haven't productized it though; uzi looks great!

senko•6mo ago

TIL worktrees exist! https://git-scm.com/docs/git-worktree

Thanks :)

asselinpaul•6mo ago

might be of interest https://9999years.github.io/git-prole/

dangoodmanUT•6mo ago

thank you for not writing this in python

crawshaw•6mo ago

This is a good strategy we (sketch.dev) experimented with a bit, but in the end we went with containers because it gives the LLM more freedom to, e.g. `apt-get install jq` and other tools.

sureglymop•6mo ago

I love how the one non-broken toggle still wasn't great. Now you can save time while wasting your time ;)

skeptrune•6mo ago

It was better than starting from scratch though. Imo, getting a functional wireframe for $0.40 is a good deal.

hombre_fatal•6mo ago

I find that my bottleneck with LLMs on a real project is reviewing their code, QAing it, and, if it's novel code, integrating it into my own mental model of how the code works so that I can deliberately extend it in a maintainable way.

The latter is so expensive that I still write most code myself, or I'll integrate LLM code into the codebase myself.

I've used parallel Claude Code agents to do chore work for me.

But I'd be curious about examples of tasks that people find best for OP's level of parallelization.

CraigJPerry•6mo ago

I'm in a different direction on this, worktrees don't solve the problem for me, this is stuck in 1 agent = 1 task mode. I want a swarm of agents on 1 task.

There's a couple of agent personas i go back to over and over (researcher, designer, critic, implementer, summariser), for most tasks i reuse 90%+ of the same prompt, but implementer has variants, one that's primed with an llms.txt (see answerdotai) for a given library i want to use, another that's configured to use gemini (i prefer its tailwind capabilities) rather than claude etc.

To organise these reusable agents i'm currently test driving langroid, each agent contributes via a sub task.

It's not perfect yet though.

skeptrune•6mo ago

I think you misread. The point I make is that it's many agents = 1 task.

Since the probability of a LLM succeeding at any given task is sub 100%, you should run multiple of the same LLM with the same prompted task in parallel.

yakbarber•6mo ago

I think OP means they should be collaborating. In the posters proposed solution each agent is independent. But you could reduce the human attention required by having multiple rounds of evaluation and feedback from other agents before it gets to the human.

danielbln•6mo ago

What I don't like about this approach is that it mainly improves the chances of zero-shotting a feature, but I require a ping pong with the LLM to iterate on the code/approach. Not sure how to parallelize that, I'm not gonna keep the mental model of 4+ iterations of code in my head and iterate on all of them.

For visual UI iteration this seems amazing given the right tooling, as the author states.

I could see it maybe useful for TDD. Let four agents run on a test file and implement until it passes. Restrict to 50 iterations per agent, first one that passes the test terminates other in-progress sessions. Rinse and repeat.

Flemlo•6mo ago

I write docs often and what works wonders with LLM is good docs. A readme a architectural doc etc.

Helps me to plan it well and the LLM to work a lot better

mooreds•6mo ago

Bonus! Future you and other devs working in the system will benefit from docs as well.

diggan•6mo ago

> but I require a ping pong with the LLM to iterate on the code/approach

I've never got results from any LLM when doing more than one-shots. I basically have a copy-pastable prompt, and if the first answer is wrong, I update the prompt and begin from scratch. Usually I add in some "macro" magic too to automatically run shell commands and what not.

It seems like they lose "touch" with what's important so quickly, and manages to steer themselves further away if anything incorrect ends up at any place in the context. Which, thinking about how they work, sort of makes sense.

foolswisdom•6mo ago

That doesn't take away from the OP's point (and OP didn't specify what ping ponging looks like, could be the same as you're describing), you are still iterating based on the results, and updating the prompt based on issues you see in the result. It grates on a human to switch back and forth between those attempts.

scroogey•6mo ago

But if you're "starting from scratch", then what would be the problem? If none of the results match what you want, you reiterate on your prompt and start from scratch. If one of them is suitable you take it. If there's no iterating on the code with the agents, then this really wouldn't add much mental overhead? You just have to glance over more results.

skeptrune•6mo ago

>From the post: There is no easy way to send the same prompt to multiple agents at once. For instance, if all agents are stuck on the same misunderstanding of the requirements, I have to copy-paste the clarification into each session.

It's not just about zero shotting. You should be able to ping pong back and forth with all of the parallel agents at the same time. Every prompt is a dice roll, so you may as well roll as many as possible.

layoric•6mo ago

> Every prompt is a dice roll, so you may as well roll as many as possible.

Same vibe as the Datacenter Scale xkcd -> https://xkcd.com/1737/

vFunct•6mo ago

Yah it's not really usable for iteration. I don't parallelize this way. I parallelize based on functions. Different agents for different function.

Meanwhile, a huge problem in parallelization is maintaining memory-banks, like https://docs.cline.bot/prompting/cline-memory-bank

babyshake•6mo ago

I guess one way it might be able to work is with a manager agent, who delegates to IC agents to try different attempts. The manager reviews their work and understands the differences in what they are doing, and can communicate with you about it and then to the ICs doing the work. So you are like a client who has a point of contact at an engineering org who internally is managing how the project is being completed.

landl0rd•6mo ago

I usually see that results are worse after ping-pong. If one-shot doesn't do it better to "re-roll". Context window full of crap poisons its ability to do better and stay on target.

lmeyerov•6mo ago

I like to think about maximizing throughput while minimizing attention: both matter, and the proposal here is expensive on my attention. Optimizing per-task latency matters less than enabling longer non-interactive runs.

For parallelism, I'm finding it more productive to have multiple distinct tasks that I multitask on and guide each to completion. Along the way I improve the repo docs and tools so the AI is more self-sufficient the next time, so my energy goes more to enabling longer runs.

Ex: One worker improving all docs. I can come back, give feedback, and redo all of them. If I'm going to mess with optimizing agent flows, it'd be to make the repo style guide clearer to the AI. In theory I can divide docs sections and manually run sections in parallel, or ask for multiple parallel versions of it all for comparison... But that's a lot of overhead. Instead, I can fork the repo and work another another non-docs issue in parallel. A. Individual task is slow, but I get more tasks done, and with less human effort.

I'd like tools to automate fork/join parallelism for divide-and-conquer plans, and that feels inevitable. For now, they do fairly linear CoT, and easier for me to do distinct tasks vs worrying about coordinating.

dgunay•6mo ago

This looks like a much more sophisticated version of my setup. I had Aider vibe code me a script that just manages cloning a repo into a subfolder, optionally with some kind of identifying suffix on it, and then wrote a tiny script to automate calling that script, `cd`ing into the directory, and then running codex-cli on it. The resulting workflow: I open a new terminal, type `vibe --suffix=<suffix> <prompt>`, and then I can go do something else.

8200_unit•6mo ago

Could you share your scripts?

dgunay•6mo ago

IMO you're better off just asking Aider to write one tailored to your specific use cases. Anonymizing the code so that I could post this gist is actually the first time I've read most of it, and it's really bad code. But in case none of that deters you, here you go: https://gist.github.com/dgunay/4a07db199ca154614c2193718da60...

mrbonner•6mo ago

I wonder why we should spend so much effort to do this vs. say using checkpoints in Cline for example. You could restore task and files to a previous state and try a different prompt/plan. And, the bonus is you have all of the previous context available.

gct•6mo ago

No wonder software is so slow today when we're this profligate. "let's run four AIs and ignore 3/4 of them!" ugh.

bitpush•6mo ago

There's trade-offs all layers of the stack

Your cpu is decoding instructions optimistically (and sometimes even executing)

Your app is caching results just in case

The edge server has stuff stashed for you (and others..)

The list goes on and on...

sagarpatil•6mo ago

I avoid using my own API keys, especially for Sonnet 4 or Opus, because LLMs can rack up unexpected costs. Instead, I use Augment Code’s remote agents and Google’s Jules, which charge per message rather than by usage. This setup is ideal for me since I prefer not to run the model locally while I’m actively working on the codebase.

peterkelly•6mo ago

If AI coding agents were actually any good, you could preface your prompt with "attempt the following task four times in parallel" and that would be it.

lerchmo•6mo ago

claude code will do this https://www.youtube.com/watch?v=2TIXl2rlA6Q&t=580s

hboon•6mo ago

Coincidentally, I just posted this[1] earlier today where I made a simple change against a 210 LOC vs 1379 LOC file, comparing parameters: LOC, filename vs URL path for that webpage, playwright verification.

My question is how does the author get things done with $0.10 ? My simple example with the smaller file is $2 each.

[1] https://x.com/hboon/status/1927939888000946198

hboon•6mo ago

I run 2 "developers" and myself. Also with tmux, in 3 windows, but I just clone the repo for the 2 developers and then manually pull into my copy when I think it's done. I see various people/sites mentioning git worktrees. I know what it is, but how is it better?

davely•6mo ago

git worktrees optimize how data is shared across multiple directories. So, you’re not cloning and duplicating a bunch of code on your machine and just referencing data from the original .git folder.

The only downsides I’ve seen:

1. For JS projects at least, you still need to npm install or yarn install for all packages. This can take a bit of time.

2. Potentially breaks if you have bespoke monorepo tooling that your infra team won’t accept updates for, why is this example so specific, I don’t know just accept my PR please. But I digress.

hboon•6mo ago

Can I say it's just for disk (and perhaps speed when pulling/pushing) optimisation? But as long as that's not a bottleneck, there is no difference?

(because git clone is much cleaner and feels less risky especially with agents)

mdaniel•6mo ago

> 1. For JS projects at least, you still need to npm install or yarn install for all packages. This can take a bit of time.

I believe that's the problem pnpm is trying to solve (err, not the time part, I can't swear it's wall clock faster, but the "hey hard links are a thing" part <https://pnpm.io/faq#:~:text=pnpm%20creates%20hard%20links%20...> )

diogolsq•6mo ago

The fact that you consider this “saving time” might show that you are not being diligent with your code.

So what if the BDD is done?

Read >> Write.

As the final step, you should be cleaning up AI slop, bloated code, useless dependencies, unnecessary tests—or the lack thereof—security gaps, etc. Either way, any fine-tuning should happen in the final step.

This review stage will still eat up any time gains.

Is the AI result good enough? fix it, or refeed to AI to fix it.

hbogert•6mo ago

I agree with you, however:

> Read >> Write.

is what a lot of AI agent zealots see as no longer a thing. I have had multiple persons tell me that AI will just have to get better in reading sloppy code and humans should no longer having to be able to do it.

diogolsq•6mo ago

Those zealots, gosh.

Nonsense. When AI gets stuck in a suboptimal topology, it’s the human who nudges it out.

How will you maintain code if you’re not even able to read it?

Using AI to read introduces noise with every iteration. Did you see that trend of people asking AI to reproduce someone self over 70 times? The noise adds up—and the original meaning shifts.

AI is non deterministic.

bjackman•6mo ago

Hmm this strategy only makes sense if you can trivially evaluate each agent's results, which I haven't found to be the case.

I expect a common case would be: one agent wrote code that does the thing I want. One agent wrote code that isn't unmaintainable garbage. These are not the same agent. So now you have to combine the two solutions which is quite a lot of work.

Ask HN: Has anyone else been hit by React2Shell?

Bots, bias, and bunk: How can you tell what's real on the net?

Jamie Dimon AI will eliminate jobs we'll be working less having wonderful lives

Hōkan's Bicycle Rack

IBM Nears Roughly $11B Deal for Confluent

Palantir Could Be the Most Overvalued Company That Ever Existed

Australia wants to be critical minerals superpower- processing is dangerous

Catswords.Phantomizer: An HTTP-based DLL loader designed for .NET apps

USB Video Capture Devices: Wow They're All Bad

How Do People Catch Baseballs? (2021)

Ambient music mixed with the sounds of San Francisco public safety radio

Bad programmers are about to become exposed

Show HN: AWS VPC/Subnet Calculator and Terraform Generator (Itcmds.ai)

Ancient dirty dishes reveal decades of questionable findings

Hong Kong holds Legislative Council election

Northern California's largest non-Tesla fast charging hub: now online in Oakland

Years after anime imagined it, Japan has realized exosuits

Constructivist AI: A New Approach to AI

Making Software: Blending modes

Postgres CDC in ClickHouse, A year in review

Stanford PhD dropout hired Meta's brightest minds to join AI math startup

Cold Case Inquiries Hampered After Ancestry.com Revisits Terms of Use

Martin Hairer: Do Mathematicians Need Computers? [video]

Show HN: Matchmyvc.com – Is this going to be useful?

Color Recreation from First Principles

The surprising countries pulling off fast clean energy transitions

Earth needs more energy. Atlanta's Super Soaker creator may have a solution

I made a prompt framework that makes LLMs stop hedging and speak straight

The Web Runs on Tolerance

Show HN: Peephole

Ask HN: Has anyone else been hit by React2Shell?

Bots, bias, and bunk: How can you tell what's real on the net?

Jamie Dimon AI will eliminate jobs we'll be working less having wonderful lives

Hōkan's Bicycle Rack

IBM Nears Roughly $11B Deal for Confluent

Palantir Could Be the Most Overvalued Company That Ever Existed

Australia wants to be critical minerals superpower- processing is dangerous

Catswords.Phantomizer: An HTTP-based DLL loader designed for .NET apps

USB Video Capture Devices: Wow They're All Bad

How Do People Catch Baseballs? (2021)

Ambient music mixed with the sounds of San Francisco public safety radio

Bad programmers are about to become exposed

Show HN: AWS VPC/Subnet Calculator and Terraform Generator (Itcmds.ai)

Ancient dirty dishes reveal decades of questionable findings

Hong Kong holds Legislative Council election

Northern California's largest non-Tesla fast charging hub: now online in Oakland

Years after anime imagined it, Japan has realized exosuits

Constructivist AI: A New Approach to AI

Making Software: Blending modes

Postgres CDC in ClickHouse, A year in review

Stanford PhD dropout hired Meta's brightest minds to join AI math startup

Cold Case Inquiries Hampered After Ancestry.com Revisits Terms of Use

Martin Hairer: Do Mathematicians Need Computers? [video]

Show HN: Matchmyvc.com – Is this going to be useful?

Color Recreation from First Principles

The surprising countries pulling off fast clean energy transitions

Earth needs more energy. Atlanta's Super Soaker creator may have a solution

I made a prompt framework that makes LLMs stop hedging and speak straight

The Web Runs on Tolerance

Show HN: Peephole

LLM codegen go brrr – Parallelization with Git worktrees and tmux

Comments