Poor data: If I make one, I either if I want to:
a) Merge it (success)
b) Modify it (sometimes success, sometimes not). In one case, Codex made the wrong changes in all the right places, but it was still easier to work from that by hand.
c) Pick ideas from it (partial success)
So simple merge rates don't say much.
The denominator varies wildly based on whether or not the PR is made. If codex makes nonsense, I don't ask it to make a PR.
``` grep 'gh pr ' ~/.claude/local/node_modules/@anthropic-ai/claude-code/cli.js - Create PR using gh pr create with the format below. Use a HEREDOC to pass the body to ensure correct formatting. gh pr create --title "the pr title" --body "$(cat <<'EOF' 1. Use \`gh pr view --json number,headRepository\` to get the PR number and repository info 1. If no PR number is provided in the args, use ${O4.name}("gh pr list") to show open PRs 2. If a PR number is provided, use ${O4.name}("gh pr view <number>") to get PR details 3. Use ${O4.name}("gh pr diff <number>") to get the diff ```
However so far it's been amazing at doing grunt work. Especially grunt work i've been avoiding. "I've" already gotten a handful of things done that i've been pushing off the a year.
One issue i have noticed is i need to figure out how to more quickly toggle the level of thinking. I've used to some Opus credits on grunt work, it would be nice if the system could just automatically use Sonnet for easier grunt stuff.
It seems like Claude Code doesn't do that? some preliminary searching reveals that PRs generated by people using Claude Code use their own user account but may sign that they used Claude, example https://github.com/anthropics/claude-code/pull/1732
feat: add progress bar for token probability calculation
- Add optional progress_cb parameter to get_token_probs function
- Integrate `rich` progress bar in CLI showing real-time token processing progress
- Add comprehensive tests for progress callback functionality
- Maintain backward compatibility with optional parameter
Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
https://github.com/search?q=is:pr+is:merged+Co-Authored-By:+...
Instead of looking at the author of the PR, look for that 'Co-Authored-By: Claude' text bit.
That way I get 753 closed PRs and '1k' PRs in total, that's a pretty good acceptance rate.
also, of course OpenAI Codex would perform well because the tool is heavily tailored to this type of task, whereas Cursor is a more general-purpose (in the programming domain) tool/app.
I just started using Codex casually a few days ago though and already have 3 PRs. While different tools for different purposes make sense, Codex's fully async nature is so much nicer. It does simple things like improve consistency and make small improvements quite well which is really nice. Finally we have something that operates more like an appliance for a certain classes of problems. Previously it felt more like a teenager with a learners license.
> get fired due to productivity gains
I found out that I have access to codex on Thursday with my plus subscription. I've created and merged about a dozen PRs with it on my OSS projects since then. It's not flawless but it's pretty good. I've done some tedious work that I had been deferring, got it to complete a few FIXMEs that I hadn't gotten around to fixing, made it write some API documentation, got it to update a README, etc. It's pretty easy to review the PRs.
What I like is that it creates and works on its own branch. I can actually check that branch out, fix a few things myself, push it and then get it to do PRs against that branch. I had to fix a few small compilation issues. In one case, the fix was just removing a single import that it somehow got wrong after that everything built and the tests passed. Overall it's pretty impressive. Very usable.
I wonder how it performs on larger code bases. I expect some issues there. I'm going to give that a try next.
Of these "In the loop", seems to be the one that doesn't work that well (yet). The main problem is latency in my opinion.
A better auto complete than comes with the IDE already is actually hard and most of the AI code completion approaches I've seen conflict with the built in auto complete and don't actually don't do better. I've tried a few things and usually end up disabling the auto complete features they offer because they are quite pointless for me. What happens is that I get a lot of suggestions for code I definitely don't want drowning out the completions I do want and messing up my editing flow. Aside from having to constantly read through code that is definitely a combination of not what I'm looking for and probably wrong. And it is actually extra work that I don't need in my life. A bit of an anti feature as far as I'm concerned.
But, I actually have been using chat gpt quite a bit. It works for me because it connects to the IDE (instead of interfering with it) and it allows me to easily prompt it to ask questions about my code. This is much more useful to me than an AI second guessing me on every keystroke.
Codex adds to this by being more like a team mate that I can delegate simple things to. It would be nice if it could notify me when it is done or when it needs my input. But otherwise it's nice.
I'm pretty sure the codex and chat gpt desktop UIs might merge soon. There's no good reason to have two modalities here other than that they are probably created by two different teams. Conway's law might be an issue here. But I like what OpenAI has done with their desktop client though and they seem to be on top of that.
This data is great, and it is exciting to see the rapid growth of autonomous coding agents across GitHub.
One thing to keep in mind regarding merge rates is that each of these products creates the PR at a different phase of the work. So just tracking PR create to PR merge tells a different story for each product.
In some cases, the work to iterate on the AI generated code (and potentially abandon it if not sufficiently good) is done in private, and only pushed to a GitHub PR once the user decides they are ready to share/merge. This is the case for Codex for example. The merge rates for product experiences like this will look good in the stats presented here, even if many AI generated code changes are being abandoned privately.
For other product experiences, the Draft PR is generated immediately when a task is assigned, and users can iterate on this “in the open” with the coding agent. This creates more transparency into both the success and failure cases (including logs of the agent sessions for both). This is the case for GitHub Copilot coding agent for example. We believe this “learning in the open” is valuable for individuals, teams, and the industry. But it does lead to the merge rates reported here appearing worse - even if logically they are the same as “task assignment to merged PR” success rates for other tools.
We’re looking forward to continuing to evolve the notion of Draft PR to be even more natural for these use cases. And to enabling all of these coding agents to benefit from open collaboration on GitHub.
We are looking into paths where we can support this more personal/private kind of PR, which would provide the foundation within GitHub to support the best of both worlds here.
Current US stance seems to be: https://www.copyright.gov/newsnet/2025/1060.html “It concludes that the outputs of generative AI can be protected by copyright only where a human author has determined sufficient expressive elements”.
If entire commit is generated by AI then it is obvious what created it - it’s AI. Such commit might not be covered by the law. Is this something your team has already analysed?
Whether it's committed or not is irrelevant to the conclusion there, the question is what was the input.
If it can be shown that for the same prompt, run through the AI several times over perhaps a year, results in the same output - then I will change my mind. Or if the AI achieves personhood.
[0] Allowances for register & loop optimization, etc.
How would that work if it's a patch to a project with a copyleft license like GPL which requires all derivate work to be licensed the same?
How is ToS relevant to this thread?
More interesting question is whether one could remove the GPL restrictions on public code by telling AI to rewrite the code from scratch, providing only the behavior of the code.
This could be accomplished by making AI generate a comprehensive test suite first, and then let it write the code of the app seeing only the test suite.
I guess they're mostly selling insurance to bigCo's, and saying, hey we have the money to go to law, and the interests to win such a case, so we'll handle it
Now we have text which is legally not owned by anybody. Is it "public domain" though? It is not possible to verify it, so maybe it is but it still poses legal risks.
This is not the case. The output of a compiler is 100% created by a compiler too. Copyright is based on where the creative aspect comes from.
I have had very little luck having 2025-era AIs manage the creative aspects of coding -- design, architecture, and similar -- and that's doubly true for what appears to be the relatively simplistic model in codex (as far as I can tell, codex trades off model complexity for model time; the model does a massive amount of work for a relatively small change).
However, it is much better than I am at the mechanical aspects. LLMs can fix mechanical bugs almost instantly (the sort of thing with a cut-and-paste fix in some build process from Stack Overflow), and generate massive amounts of code without typos or shallow bugs.
A good analogy is working with powertools versus handtools. I can do much more in one step, but I'm still in creative control.
The codebase I'm working on is pretty sophisticated, and I might imagine they could implement more cookiecutter things (e.g. a standard oauth workflow) more automatically.
However, even there -- or in discussions with larger models about my existing codebase -- what they do is in part based their creativity on human contributions to their training set. I'm not sure how to weigh that. An LLM oauth workflow might be considered the creative median of a lot of human-written code.
I write a lot of AGPL code, and at least in the 3.5 era, they were clearly trained on my code, and would happily print it out more-or-less verbatim. Indeed, it was to the point where I complained to OpenAI about it at the time, but never got a response. I suspect a lot of generated code will include some fractional contribution from me now (an infinitesimal fraction most of the time, but more substantial for niche code similar to my codebase).
So in generated code, we have a mixture of at least a few different pieces:
- User's contributions, in prompt, review, etc.
- Machine contributions
- Training set contributions
https://open.spotify.com/episode/6o2Ik3w6c4x4DYILXwRSos?si=5...
But you could probably filter this a bit by looking at PR commit counts?
* Cursor agents where just introduced in Beta and have privacy limitations that prevent their usage as many organizations.
* Cursor is still focused on hands-on-keyboard agentic flows, which aren't included in these counts.
https://play.clickhouse.com/play?user=play#V0lUSCByZXBvX3N0Y...
I've also added some less popular agents like jetbrains-junie, and added a link to a random pull request for each agent, so we can look at the example PRs.
That "spark bar-chart" column output is one of the neatest things I've seen in a while. What a brilliant feature.
https://x.com/paradite_/status/1931644656762429503
Docs: https://docs.anthropic.com/en/docs/claude-code/github-action...
Glad it’s missing until they fix this.
- It has non-interactive CLI functionality (with -p "prompt" option) in addition to the default interactive TUI, making it easy to integrate to workflows.
- It has turn-key GitHub integration (https://github.com/anthropics/claude-code-action).
- It has internal task tracking system that uses ReadTodo/WriteTodo tools to write JSON task lists to `$HOME/.claude/tasks/`, and enabling it to stay on track better than most other tools.
- It has excellent and customisable context compaction.
- And it has flexible permission system that can be used to turn all permissions questions to auto-accept when running in sandboxed environments.
Together those features enable it to be just as autonomous as any GitHub AI bot action hype thing (even though that might not have been its original or primary use).
Also filter conditions that would be interesting - size of PR, language, files affected, distinct organizations etc. lmk if these get added please!
Since all agents are able to use the terminal I suggest looking up the Gitlab CLI and have it use that. Should work locally and in runners.
or something will fix this
zachlatta•8mo ago