frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

LLM codegen go brrr – Parallelization with Git worktrees and tmux

https://www.skeptrune.com/posts/git-worktrees-agents-and-tmux/
154•skeptrune•1d ago

Comments

juancn•1d ago
Now you can do 4X more code reviews!
arguflow•1d ago
Why review the code? Most of the time you just want is a good starting point.
oparin10•1d ago
If all you need is a good starting point, why not just use a framework or library?

Popular libraries/frameworks that have been around for years and have hundreds of real engineers contributing, documenting issues, and fixing bugs are pretty much guaranteed to have code that is orders of magnitude better than something that can contain subtle bugs and that they will have to maintain themselves if something breaks.

In this very same post, the user mentions building a component library called Astrobits. Following the link they posted for the library’s website, we find that the goal is to have a "neo-brutalist" pixelated 8-bit look using Astro as the main frontend framework.

This goal would be easily accomplished by just using a library like ShadCN, which also supports Astro[1], and has you install components by vendoring their fully accessibility-optimized components into your own codebase. They could then change the styles to match the desired look.

Even better, they could simply use the existing 8-bit styled ShadCN components[2] that already follow their UI design goal.

[1] - https://ui.shadcn.com/docs/installation/astro [2] - https://www.8bitcn.com/

arguflow•1d ago
Frameworks and libraries are useful to keep the code style the same.

Using multiple agents helps when the endgoal isn't seen. Especially if there is no end state UI design in mind. I've been using a similar method for shopify polaris[1] putting the building blocks together (and combing through docs to find the correct blocks) is still a massive chore.

[1] - https://polaris-react.shopify.com/getting-started

skeptrune•1d ago
I think AI makes personal software possible in a way that it wasn't before. Without LLMs, I would have never had the time to build a component library at all and would have probably used 8bitcn (looks awesome btw) and added the neo-brutalist shadows I wanted.

However, despite my gripes with ShadCN for Astro being minor (lots of deps + required client:load template directive), just small friction points are enough that I'm willing to quickly build my own project. AI makes it barely any work, especially when I lower the variance using parallelization.

eikenberry•1d ago
> If all you need is a good starting point, why not just use a framework or library?

A good starting point fixes the blank page problem. Frameworks or libraries don't address this problem.

vidyootsenthil•1d ago
or also 4x productivity!
juancn•1d ago
Coding has never been for me the bottleneck, it's all the other crap that takes time.
morkalork•1d ago
What's the issue, everyone loves doing code review right?
thepablohansen•1d ago
This resonates- my engineering workflow has started shifting from highly focused, long periods of building out a feature to one that has much more context switching, review, and testing.
asadm•1d ago
ooh i was exploring this path, aider is so slow. thanks for validating it.
arguflow•1d ago
Is aider supposed to do worktrees by default?
asadm•1d ago
i dont think so?
asadm•1d ago
i dont see uzi code on github.
skeptrune•1d ago
Hadn't pushed from our remotes. There now![1]

- [1] https://github.com/devflowinc/uzi

uludag•1d ago
I completely see the benefit for this strategy. Defaulting to something like this would seem to inflate costs though, as a tradeoff for time. I know certain LLM usages can be pretty pricy. I hope that something like this doesn't become the default though as I can see parallelization being a dark pattern for those making money off of token usage.
ramoz•1d ago
I don’t think it’s a great representation on of the utility of worktrees or even efficient practical use of agents.
vFunct•1d ago
It pretty much is though. This is exactly what you'd do if you had 100 different employees.
ramoz•1d ago
I wouldn't ask for 100 different versions of the same feature from each of them.

1 agent is supposed to be powerful with proper context and engineering design decisions in mind - whether UI or backend.

Asking 3 different agents to do the same engineering task reeks of inefficient or ineffective development patterns with agents.

TeMPOraL•1d ago
> I wouldn't ask for 100 different versions of the same feature from each of them.

You wouldn't because human labor is too expensive to make it worthwhile. You would if it were too cheap to meter.

We actually do that at the scale of society - that's market competition in a nutshell. Lots of people building variants of the same things, then battling it out on the market. Yes, it's wasteful as hell (something too rarely talked about), but we don't have a better practical alternative at this point, so there's some merit to the general idea.

(Also the same principle applies to all life - both in terms of how it evolves, and how parts of living organisms work internally. Actively maintained equilibria abound.)

maxbond•17h ago
> Also the same principle applies to all life

Actively maintained equilibria abound, but this is not typically the mechanism. Different species in adjacent niches aren't better or worse versions of the same organism to be evaluated and either selected or discarded. It's more typical for them to adopt a strategy of ecological segmentation so that they can all have their needs met. Every few years moths migrate to my state to reproduce - and they do so before our local moths have woken up for the season, and leave around the time they do, so that they aren't in competition. Birds that feed from the same trees will eat from different parts of the tree and mate at different times, so that their peak energy consumption doesn't line up. What would the benefit be in driving each other to extinction?

Evolution doesn't make value judgments, it doesn't know which species is better or worse and it doesn't know how future climactic shifts will change the fitness landscape. Segmentation is both easier and a hedge against future climactic shifts.

Engineering works under a very different logic where the goal is optimal performance in a controlled environment for an acceptable service life, not satisfactory performance with extremely high robustness in the face of unknown changes into the perpetual future. When we rank different systems and select the most optimal, we are designing a system that is extremely brittle on geologic timescales. Abandon a structure and it will quickly fall apart. But we don't care because we're not operating at geologic timescales and we expect to be around to redesign systems as their environment changes to make them unsuitable.

Similarly, the reproduction of labor/capacity in markets you described could be viewed as trading efficiency for robustness instead of as waste. Eg, heavily optimized supply chains are great for costs, but can have trouble adapting to global pandemics, wars in inconvenient places, or ships getting stuck in the wrong canal.

vFunct•1d ago
I actually don’t use them that way. I use 100 different agents on 100 different worktrees to develop 100 different apps for the overall project.
ramoz•23h ago
That’s what I’m advocating for. That’s not what was demonstrated in the blog
tough•7h ago
in frontend exploratory random work might have some value if you dont know what you need.

both seem valid uses of this synthetic intelligence to me

tough•1d ago
what if you have 100 lint errors that you can parallelize fixing to 100 small local 1B llms
ramoz•23h ago
This is exactly what I would do.
ukuina•18h ago
Without agent collaboration, you'll need a whole tree of agents just to resolve the merge conflicts.
greymalik•1d ago
This is discussed in TFA. The absolute costs are negligible, particularly in comparison to the time saved.
arguflow•1d ago
I think the most annoying part is when a coding agent takes a particularly long time to produce something. AND has bad output, it is such a time sink / sunk cost
vercantez•1d ago
Very cool! Actually practical use of scaling parallel test time compute. I've been using worktrees + agents to work on separate features but never considered allocating N agents per task.
maximilianroos•1d ago
I posted some notes from a full setup I've built for myself with worktrees: https://github.com/anthropics/claude-code/issues/1052

I haven't productized it though; uzi looks great!

senko•1d ago
TIL worktrees exist! https://git-scm.com/docs/git-worktree

Thanks :)

asselinpaul•19h ago
might be of interest https://9999years.github.io/git-prole/
dangoodmanUT•1d ago
thank you for not writing this in python
crawshaw•1d ago
This is a good strategy we (sketch.dev) experimented with a bit, but in the end we went with containers because it gives the LLM more freedom to, e.g. `apt-get install jq` and other tools.
sureglymop•1d ago
I love how the one non-broken toggle still wasn't great. Now you can save time while wasting your time ;)
skeptrune•1d ago
It was better than starting from scratch though. Imo, getting a functional wireframe for $0.40 is a good deal.
hombre_fatal•1d ago
I find that my bottleneck with LLMs on a real project is reviewing their code, QAing it, and, if it's novel code, integrating it into my own mental model of how the code works so that I can deliberately extend it in a maintainable way.

The latter is so expensive that I still write most code myself, or I'll integrate LLM code into the codebase myself.

I've used parallel Claude Code agents to do chore work for me.

But I'd be curious about examples of tasks that people find best for OP's level of parallelization.

CraigJPerry•1d ago
I'm in a different direction on this, worktrees don't solve the problem for me, this is stuck in 1 agent = 1 task mode. I want a swarm of agents on 1 task.

There's a couple of agent personas i go back to over and over (researcher, designer, critic, implementer, summariser), for most tasks i reuse 90%+ of the same prompt, but implementer has variants, one that's primed with an llms.txt (see answerdotai) for a given library i want to use, another that's configured to use gemini (i prefer its tailwind capabilities) rather than claude etc.

To organise these reusable agents i'm currently test driving langroid, each agent contributes via a sub task.

It's not perfect yet though.

skeptrune•1d ago
I think you misread. The point I make is that it's many agents = 1 task.

Since the probability of a LLM succeeding at any given task is sub 100%, you should run multiple of the same LLM with the same prompted task in parallel.

yakbarber•23h ago
I think OP means they should be collaborating. In the posters proposed solution each agent is independent. But you could reduce the human attention required by having multiple rounds of evaluation and feedback from other agents before it gets to the human.
danielbln•1d ago
What I don't like about this approach is that it mainly improves the chances of zero-shotting a feature, but I require a ping pong with the LLM to iterate on the code/approach. Not sure how to parallelize that, I'm not gonna keep the mental model of 4+ iterations of code in my head and iterate on all of them.

For visual UI iteration this seems amazing given the right tooling, as the author states.

I could see it maybe useful for TDD. Let four agents run on a test file and implement until it passes. Restrict to 50 iterations per agent, first one that passes the test terminates other in-progress sessions. Rinse and repeat.

Flemlo•1d ago
I write docs often and what works wonders with LLM is good docs. A readme a architectural doc etc.

Helps me to plan it well and the LLM to work a lot better

mooreds•23h ago
Bonus! Future you and other devs working in the system will benefit from docs as well.
diggan•1d ago
> but I require a ping pong with the LLM to iterate on the code/approach

I've never got results from any LLM when doing more than one-shots. I basically have a copy-pastable prompt, and if the first answer is wrong, I update the prompt and begin from scratch. Usually I add in some "macro" magic too to automatically run shell commands and what not.

It seems like they lose "touch" with what's important so quickly, and manages to steer themselves further away if anything incorrect ends up at any place in the context. Which, thinking about how they work, sort of makes sense.

foolswisdom•23h ago
That doesn't take away from the OP's point (and OP didn't specify what ping ponging looks like, could be the same as you're describing), you are still iterating based on the results, and updating the prompt based on issues you see in the result. It grates on a human to switch back and forth between those attempts.
scroogey•15h ago
But if you're "starting from scratch", then what would be the problem? If none of the results match what you want, you reiterate on your prompt and start from scratch. If one of them is suitable you take it. If there's no iterating on the code with the agents, then this really wouldn't add much mental overhead? You just have to glance over more results.
skeptrune•1d ago
>From the post: There is no easy way to send the same prompt to multiple agents at once. For instance, if all agents are stuck on the same misunderstanding of the requirements, I have to copy-paste the clarification into each session.

It's not just about zero shotting. You should be able to ping pong back and forth with all of the parallel agents at the same time. Every prompt is a dice roll, so you may as well roll as many as possible.

layoric•1d ago
> Every prompt is a dice roll, so you may as well roll as many as possible.

Same vibe as the Datacenter Scale xkcd -> https://xkcd.com/1737/

vFunct•1d ago
Yah it's not really usable for iteration. I don't parallelize this way. I parallelize based on functions. Different agents for different function.

Meanwhile, a huge problem in parallelization is maintaining memory-banks, like https://docs.cline.bot/prompting/cline-memory-bank

babyshake•17h ago
I guess one way it might be able to work is with a manager agent, who delegates to IC agents to try different attempts. The manager reviews their work and understands the differences in what they are doing, and can communicate with you about it and then to the ICs doing the work. So you are like a client who has a point of contact at an engineering org who internally is managing how the project is being completed.
landl0rd•17h ago
I usually see that results are worse after ping-pong. If one-shot doesn't do it better to "re-roll". Context window full of crap poisons its ability to do better and stay on target.
lmeyerov•1d ago
I like to think about maximizing throughput while minimizing attention: both matter, and the proposal here is expensive on my attention. Optimizing per-task latency matters less than enabling longer non-interactive runs.

For parallelism, I'm finding it more productive to have multiple distinct tasks that I multitask on and guide each to completion. Along the way I improve the repo docs and tools so the AI is more self-sufficient the next time, so my energy goes more to enabling longer runs.

Ex: One worker improving all docs. I can come back, give feedback, and redo all of them. If I'm going to mess with optimizing agent flows, it'd be to make the repo style guide clearer to the AI. In theory I can divide docs sections and manually run sections in parallel, or ask for multiple parallel versions of it all for comparison... But that's a lot of overhead. Instead, I can fork the repo and work another another non-docs issue in parallel. A. Individual task is slow, but I get more tasks done, and with less human effort.

I'd like tools to automate fork/join parallelism for divide-and-conquer plans, and that feels inevitable. For now, they do fairly linear CoT, and easier for me to do distinct tasks vs worrying about coordinating.

dgunay•1d ago
This looks like a much more sophisticated version of my setup. I had Aider vibe code me a script that just manages cloning a repo into a subfolder, optionally with some kind of identifying suffix on it, and then wrote a tiny script to automate calling that script, `cd`ing into the directory, and then running codex-cli on it. The resulting workflow: I open a new terminal, type `vibe --suffix=<suffix> <prompt>`, and then I can go do something else.
8200_unit•1d ago
Could you share your scripts?
mrbonner•21h ago
I wonder why we should spend so much effort to do this vs. say using checkpoints in Cline for example. You could restore task and files to a previous state and try a different prompt/plan. And, the bonus is you have all of the previous context available.
gct•20h ago
No wonder software is so slow today when we're this profligate. "let's run four AIs and ignore 3/4 of them!" ugh.
bitpush•20h ago
There's trade-offs all layers of the stack

Your cpu is decoding instructions optimistically (and sometimes even executing)

Your app is caching results just in case

The edge server has stuff stashed for you (and others..)

The list goes on and on...

sagarpatil•18h ago
I avoid using my own API keys, especially for Sonnet 4 or Opus, because LLMs can rack up unexpected costs. Instead, I use Augment Code’s remote agents and Google’s Jules, which charge per message rather than by usage. This setup is ideal for me since I prefer not to run the model locally while I’m actively working on the codebase.
peterkelly•16h ago
If AI coding agents were actually any good, you could preface your prompt with "attempt the following task four times in parallel" and that would be it.
lerchmo•7h ago
claude code will do this https://www.youtube.com/watch?v=2TIXl2rlA6Q&t=580s
hboon•16h ago
Coincidentally, I just posted this[1] earlier today where I made a simple change against a 210 LOC vs 1379 LOC file, comparing parameters: LOC, filename vs URL path for that webpage, playwright verification.

My question is how does the author get things done with $0.10 ? My simple example with the smaller file is $2 each.

[1] https://x.com/hboon/status/1927939888000946198

hboon•16h ago
I run 2 "developers" and myself. Also with tmux, in 3 windows, but I just clone the repo for the 2 developers and then manually pull into my copy when I think it's done. I see various people/sites mentioning git worktrees. I know what it is, but how is it better?
davely•15h ago
git worktrees optimize how data is shared across multiple directories. So, you’re not cloning and duplicating a bunch of code on your machine and just referencing data from the original .git folder.

The only downsides I’ve seen:

1. For JS projects at least, you still need to npm install or yarn install for all packages. This can take a bit of time.

2. Potentially breaks if you have bespoke monorepo tooling that your infra team won’t accept updates for, why is this example so specific, I don’t know just accept my PR please. But I digress.

hboon•15h ago
Can I say it's just for disk (and perhaps speed when pulling/pushing) optimisation? But as long as that's not a bottleneck, there is no difference?

(because git clone is much cleaner and feels less risky especially with agents)

mdaniel•7h ago
> 1. For JS projects at least, you still need to npm install or yarn install for all packages. This can take a bit of time.

I believe that's the problem pnpm is trying to solve (err, not the time part, I can't swear it's wall clock faster, but the "hey hard links are a thing" part <https://pnpm.io/faq#:~:text=pnpm%20creates%20hard%20links%20...> )

diogolsq•16h ago
The fact that you consider this “saving time” might show that you are not being diligent with your code.

So what if the BDD is done?

Read >> Write.

As the final step, you should be cleaning up AI slop, bloated code, useless dependencies, unnecessary tests—or the lack thereof—security gaps, etc. Either way, any fine-tuning should happen in the final step.

This review stage will still eat up any time gains.

Is the AI result good enough? fix it, or refeed to AI to fix it.

hbogert•6h ago
I agree with you, however:

> Read >> Write.

is what a lot of AI agent zealots see as no longer a thing. I have had multiple persons tell me that AI will just have to get better in reading sloppy code and humans should no longer having to be able to do it.

bjackman•11h ago
Hmm this strategy only makes sense if you can trivially evaluate each agent's results, which I haven't found to be the case.

I expect a common case would be: one agent wrote code that does the thing I want. One agent wrote code that isn't unmaintainable garbage. These are not the same agent. So now you have to combine the two solutions which is quite a lot of work.

Watch a robot play badminton against human players

https://www.science.org/content/article/watch-robot-play-badminton-against-human-players
1•geox•7m ago•0 comments

SQL Workbench – Republicans not welcome

https://www.sql-workbench.eu/
1•bitbasher•7m ago•0 comments

85% of Komoot staff being let go after being sold to Bending Spoons

https://www.dcrainmaker.com/2025/05/komoot-team-goodbye.html
1•sorenjan•8m ago•0 comments

The Atomic Airplane

https://whatisnuclear.com/the-story-of-the-atomic-airplane.html
1•mpweiher•9m ago•0 comments

Coinbase support agents were compromised

https://fortune.com/crypto/2025/05/29/coinbase-hack-the-community-taskus-bpos-teenagers/
2•miohtama•12m ago•0 comments

Tesla FSD doesn't stop for school buses with stop signs and red lights

https://bsky.app/profile/realdanodowd.bsky.social/post/3lqafg2zqfk2v
3•alex_young•12m ago•0 comments

Canva's Dev MCP Server

https://www.canva.dev/docs/apps/mcp-server/
1•gitgud•15m ago•0 comments

Lifetime earnings and the making of Subliminal Words

https://indest.ee/lifetime-earnings-and-the-making-of-subliminal-words/
1•indest•18m ago•0 comments

Is gravity evidence of a computational universe?

https://pubs.aip.org/aip/adv/article/15/4/045035/3345217/Is-gravity-evidence-of-a-computational-universe
1•mpweiher•19m ago•0 comments

Why Did the Chicken Cross the Road?

https://www.physics.harvard.edu/undergrad/humor
1•mapolone•19m ago•0 comments

My Guide to Avoiding the Internet's Constant Noise

https://uscne.blogspot.com/2025/05/my-guide-to-avoiding-internets.html
1•uscne•22m ago•0 comments

Florida was a shining example of open government. Not any more

https://www.tampabay.com/opinion/2025/05/29/florida-was-shining-example-open-government-not-any-more-column/
1•howard941•27m ago•0 comments

JavaScript Errors

https://www.haydenbleasel.com/blog/on-javascript-errors
1•fmerian•29m ago•0 comments

Ask HN: How Bing is indexing YouTube videos?

1•the_arun•29m ago•1 comments

The Bullseye of America's Internet (2012)

https://gizmodo.com/the-bullseye-of-america-s-internet-5913934
1•droideqa•30m ago•0 comments

AI Isn't a Worker Replacement; It's a Human Replacement

https://old.reddit.com/r/changemyview/comments/1kygh20/cmv_ai_isnt_a_worker_replacement_its_a_human/
1•amichail•33m ago•0 comments

The Future of Comments Is Lies, I Guess

https://aphyr.com/posts/388-the-future-of-comments-is-lies-i-guess
3•zdw•33m ago•1 comments

A new architecture emulates imagination and higher-level human mental states

https://techxplore.com/news/2025-05-architecture-emulates-higher-human-mental.html
2•birriel•36m ago•0 comments

An AI Chatbot Is Flooding Canadian City Councils with Climate Misinformation

https://www.desmog.com/2025/05/28/a-weaponized-ai-chatbot-is-flooding-canadian-city-councils-with-climate-misinformation/
1•OutOfHere•38m ago•0 comments

An online space for Python developers to code, run and share small Python things

https://py.space/
1•thomascountz•38m ago•0 comments

Cognitive Debt

https://smithery.com/2025/05/05/cognitive-debt/
1•frereubu•38m ago•0 comments

Show HN: I Launched B2B Ad Network for Tech Startups

https://www.tinyadz.com/
1•johnrushx•41m ago•0 comments

The Vagus Nerve Industry

https://www.newstatesman.com/culture/books/book-of-the-day/2025/05/the-vagus-nerve-industry
3•bookofjoe•43m ago•0 comments

RFK Jr.'S MAHA Report Cites Research Studies That Don't Exist

https://www.rollingstone.com/politics/politics-news/rfk-jr-maha-report-fake-research-studies-1235350468/
2•Animats•49m ago•3 comments

Show HN: Willow Voice (YC X25) – Personalized Dictation You Can Use Anywhere

https://willowvoice.com/
2•LiuLawrence45•50m ago•0 comments

Basecoat UI: All of the shadcn/UI magic, none of the React

https://basecoatui.com/
1•icar•51m ago•0 comments

Qian Xuesen

https://en.wikipedia.org/wiki/Qian_Xuesen
1•Metacelsus•54m ago•0 comments

Show HN: Client side mental health screeners

https://kairoscope.org/
1•SubzeroCarnage•57m ago•0 comments

HellCaptcha: Accessibility Theater at Its Worst

https://fireborn.mataroa.blog/blog/hellcaptcha-accessibility-theater-at-its-worst/
2•SSLy•57m ago•0 comments

Understanding Droids

https://docs.factory.ai/user-guides/droids/understanding-droids
4•data_delaurier•1h ago•1 comments