1 agent is supposed to be powerful with proper context and engineering design decisions in mind - whether UI or backend.
Asking 3 different agents to do the same engineering task reeks of inefficient or ineffective development patterns with agents.
You wouldn't because human labor is too expensive to make it worthwhile. You would if it were too cheap to meter.
We actually do that at the scale of society - that's market competition in a nutshell. Lots of people building variants of the same things, then battling it out on the market. Yes, it's wasteful as hell (something too rarely talked about), but we don't have a better practical alternative at this point, so there's some merit to the general idea.
(Also the same principle applies to all life - both in terms of how it evolves, and how parts of living organisms work internally. Actively maintained equilibria abound.)
Actively maintained equilibria abound, but this is not typically the mechanism. Different species in adjacent niches aren't better or worse versions of the same organism to be evaluated and either selected or discarded. It's more typical for them to adopt a strategy of ecological segmentation so that they can all have their needs met. Every few years moths migrate to my state to reproduce - and they do so before our local moths have woken up for the season, and leave around the time they do, so that they aren't in competition. Birds that feed from the same trees will eat from different parts of the tree and mate at different times, so that their peak energy consumption doesn't line up. What would the benefit be in driving each other to extinction?
Evolution doesn't make value judgments, it doesn't know which species is better or worse and it doesn't know how future climactic shifts will change the fitness landscape. Segmentation is both easier and a hedge against future climactic shifts.
Engineering works under a very different logic where the goal is optimal performance in a controlled environment for an acceptable service life, not satisfactory performance with extremely high robustness in the face of unknown changes into the perpetual future. When we rank different systems and select the most optimal, we are designing a system that is extremely brittle on geologic timescales. Abandon a structure and it will quickly fall apart. But we don't care because we're not operating at geologic timescales and we expect to be around to redesign systems as their environment changes to make them unsuitable.
Similarly, the reproduction of labor/capacity in markets you described could be viewed as trading efficiency for robustness instead of as waste. Eg, heavily optimized supply chains are great for costs, but can have trouble adapting to global pandemics, wars in inconvenient places, or ships getting stuck in the wrong canal.
both seem valid uses of this synthetic intelligence to me
I haven't productized it though; uzi looks great!
Thanks :)
The latter is so expensive that I still write most code myself, or I'll integrate LLM code into the codebase myself.
I've used parallel Claude Code agents to do chore work for me.
But I'd be curious about examples of tasks that people find best for OP's level of parallelization.
There's a couple of agent personas i go back to over and over (researcher, designer, critic, implementer, summariser), for most tasks i reuse 90%+ of the same prompt, but implementer has variants, one that's primed with an llms.txt (see answerdotai) for a given library i want to use, another that's configured to use gemini (i prefer its tailwind capabilities) rather than claude etc.
To organise these reusable agents i'm currently test driving langroid, each agent contributes via a sub task.
It's not perfect yet though.
Since the probability of a LLM succeeding at any given task is sub 100%, you should run multiple of the same LLM with the same prompted task in parallel.
For visual UI iteration this seems amazing given the right tooling, as the author states.
I could see it maybe useful for TDD. Let four agents run on a test file and implement until it passes. Restrict to 50 iterations per agent, first one that passes the test terminates other in-progress sessions. Rinse and repeat.
Helps me to plan it well and the LLM to work a lot better
I've never got results from any LLM when doing more than one-shots. I basically have a copy-pastable prompt, and if the first answer is wrong, I update the prompt and begin from scratch. Usually I add in some "macro" magic too to automatically run shell commands and what not.
It seems like they lose "touch" with what's important so quickly, and manages to steer themselves further away if anything incorrect ends up at any place in the context. Which, thinking about how they work, sort of makes sense.
It's not just about zero shotting. You should be able to ping pong back and forth with all of the parallel agents at the same time. Every prompt is a dice roll, so you may as well roll as many as possible.
Same vibe as the Datacenter Scale xkcd -> https://xkcd.com/1737/
Meanwhile, a huge problem in parallelization is maintaining memory-banks, like https://docs.cline.bot/prompting/cline-memory-bank
For parallelism, I'm finding it more productive to have multiple distinct tasks that I multitask on and guide each to completion. Along the way I improve the repo docs and tools so the AI is more self-sufficient the next time, so my energy goes more to enabling longer runs.
Ex: One worker improving all docs. I can come back, give feedback, and redo all of them. If I'm going to mess with optimizing agent flows, it'd be to make the repo style guide clearer to the AI. In theory I can divide docs sections and manually run sections in parallel, or ask for multiple parallel versions of it all for comparison... But that's a lot of overhead. Instead, I can fork the repo and work another another non-docs issue in parallel. A. Individual task is slow, but I get more tasks done, and with less human effort.
I'd like tools to automate fork/join parallelism for divide-and-conquer plans, and that feels inevitable. For now, they do fairly linear CoT, and easier for me to do distinct tasks vs worrying about coordinating.
Your cpu is decoding instructions optimistically (and sometimes even executing)
Your app is caching results just in case
The edge server has stuff stashed for you (and others..)
The list goes on and on...
My question is how does the author get things done with $0.10 ? My simple example with the smaller file is $2 each.
The only downsides I’ve seen:
1. For JS projects at least, you still need to npm install or yarn install for all packages. This can take a bit of time.
2. Potentially breaks if you have bespoke monorepo tooling that your infra team won’t accept updates for, why is this example so specific, I don’t know just accept my PR please. But I digress.
(because git clone is much cleaner and feels less risky especially with agents)
I believe that's the problem pnpm is trying to solve (err, not the time part, I can't swear it's wall clock faster, but the "hey hard links are a thing" part <https://pnpm.io/faq#:~:text=pnpm%20creates%20hard%20links%20...> )
So what if the BDD is done?
Read >> Write.
As the final step, you should be cleaning up AI slop, bloated code, useless dependencies, unnecessary tests—or the lack thereof—security gaps, etc. Either way, any fine-tuning should happen in the final step.
This review stage will still eat up any time gains.
Is the AI result good enough? fix it, or refeed to AI to fix it.
> Read >> Write.
is what a lot of AI agent zealots see as no longer a thing. I have had multiple persons tell me that AI will just have to get better in reading sloppy code and humans should no longer having to be able to do it.
I expect a common case would be: one agent wrote code that does the thing I want. One agent wrote code that isn't unmaintainable garbage. These are not the same agent. So now you have to combine the two solutions which is quite a lot of work.
juancn•1d ago
arguflow•1d ago
oparin10•1d ago
Popular libraries/frameworks that have been around for years and have hundreds of real engineers contributing, documenting issues, and fixing bugs are pretty much guaranteed to have code that is orders of magnitude better than something that can contain subtle bugs and that they will have to maintain themselves if something breaks.
In this very same post, the user mentions building a component library called Astrobits. Following the link they posted for the library’s website, we find that the goal is to have a "neo-brutalist" pixelated 8-bit look using Astro as the main frontend framework.
This goal would be easily accomplished by just using a library like ShadCN, which also supports Astro[1], and has you install components by vendoring their fully accessibility-optimized components into your own codebase. They could then change the styles to match the desired look.
Even better, they could simply use the existing 8-bit styled ShadCN components[2] that already follow their UI design goal.
[1] - https://ui.shadcn.com/docs/installation/astro [2] - https://www.8bitcn.com/
arguflow•1d ago
Using multiple agents helps when the endgoal isn't seen. Especially if there is no end state UI design in mind. I've been using a similar method for shopify polaris[1] putting the building blocks together (and combing through docs to find the correct blocks) is still a massive chore.
[1] - https://polaris-react.shopify.com/getting-started
skeptrune•1d ago
However, despite my gripes with ShadCN for Astro being minor (lots of deps + required client:load template directive), just small friction points are enough that I'm willing to quickly build my own project. AI makes it barely any work, especially when I lower the variance using parallelization.
eikenberry•1d ago
A good starting point fixes the blank page problem. Frameworks or libraries don't address this problem.
vidyootsenthil•1d ago
juancn•1d ago
morkalork•1d ago