Oh boy.
It is the responsibility of the person running the coding agent to make sure the resulting PRs are high quality. Putting that on your team mates, or worse, random open source project maintainers on the internet, is the definition of an extractive contribution.
Put another way, who are they supposed to hire to tell these low quality PRs apart from the high quality ones? Who even knows how to do something like that?!
Ime successful creative execution looks like micro-iterations where each output informs the next creative move.
I can build something incredibly fast from essentially caveman grunt instructions through an LLM harness, iterating as I go.
Optimizing for feeding a huge plan to an agent sounds to me like a net waste of time. And looking over the shoulder of industry peers trying to do this, I don’t see their outputs or throughput some remarkable improvement over what I can produce with minimal fanfare usage.
it works.
this has been an obvious thing to do since at least January (since Geoffrey Huntley published "everything is a ralph loop"), and this is how I've been working: build enough orchestration tooling to be able to automate everything: development container bringup, building it, running the unit tests, doing integration testing, and using the software as eventually an end user. then to iterate set performance goals on an already solid basis so the automated agent ("gym") can go and iterate autonomously, and let you know when it's "done".
I understand this probably does not work if you're on some subscription and not using the API (tokens burn fast), but this has been extremely productive for me.
as usual, the tool isnt really doing whats listed on its label.
however, people are different so this might improve someones capability to deploy LLMs. might even provide better evidence where actual brain power is needed.
OP quoted the correct definition right at the start:
> In systems engineering, backpressure is the mechanism by which a downstream component signals upstream that it can't accept more work
(the "downstream component" being the human reviewer in this case)
But the measures they propose don't actually do that. They are more like fixed throttle elements which would slow down the rate of submissions of an agent and weed out some low-quality submissions before hitting "downstream".
I'm missing the connection to the actual capacity (or will) that the human developers have to review the submissions.
It comes from previous posts I’ve come across, but I haven’t considered exactly what you mentioned. That’s on me.
https://github.com/puraxyz/puraxyz/blob/main/docs/paper/main...
So e.g. I may have 1 agent that I ask and iterate on with directly, and 9 agents that work separately on their own.
I will utilize this 1 agent on features I care most about and want to guide and iterate on in as much detail as possible.
mark_l_watson•43m ago