Poor data: If I make one, I either if I want to:
a) Merge it (success)
b) Modify it (sometimes success, sometimes not). In one case, Codex made the wrong changes in all the right places, but it was still easier to work from that by hand.
c) Pick ideas from it (partial success)
So simple merge rates don't say much.
``` grep 'gh pr ' ~/.claude/local/node_modules/@anthropic-ai/claude-code/cli.js - Create PR using gh pr create with the format below. Use a HEREDOC to pass the body to ensure correct formatting. gh pr create --title "the pr title" --body "$(cat <<'EOF' 1. Use \`gh pr view --json number,headRepository\` to get the PR number and repository info 1. If no PR number is provided in the args, use ${O4.name}("gh pr list") to show open PRs 2. If a PR number is provided, use ${O4.name}("gh pr view <number>") to get PR details 3. Use ${O4.name}("gh pr diff <number>") to get the diff ```
It seems like Claude Code doesn't do that? some preliminary searching reveals that PRs generated by people using Claude Code use their own user account but may sign that they used Claude, example https://github.com/anthropics/claude-code/pull/1732
feat: add progress bar for token probability calculation
- Add optional progress_cb parameter to get_token_probs function
- Integrate `rich` progress bar in CLI showing real-time token processing progress
- Add comprehensive tests for progress callback functionality
- Maintain backward compatibility with optional parameter
Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
https://github.com/search?q=is:pr+is:merged+Co-Authored-By:+...
Instead of looking at the author of the PR, look for that 'Co-Authored-By: Claude' text bit.
That way I get 753 closed PRs and '1k' PRs in total, that's a pretty good acceptance rate.
also, of course OpenAI Codex would perform well because the tool is heavily tailored to this type of task, whereas Cursor is a more general-purpose (in the programming domain) tool/app.
I just started using Codex casually a few days ago though and already have 3 PRs. While different tools for different purposes make sense, Codex's fully async nature is so much nicer. It does simple things like improve consistency and make small improvements quite well which is really nice. Finally we have something that operates more like an appliance for a certain classes of problems. Previously it felt more like a teenager with a learners license.
This data is great, and it is exciting to see the rapid growth of autonomous coding agents across GitHub.
One thing to keep in mind regarding merge rates is that each of these products creates the PR at a different phase of the work. So just tracking PR create to PR merge tells a different story for each product.
In some cases, the work to iterate on the AI generated code (and potentially abandon it if not sufficiently good) is done in private, and only pushed to a GitHub PR once the user decides they are ready to share/merge. This is the case for Codex for example. The merge rates for product experiences like this will look good in the stats presented here, even if many AI generated code changes are being abandoned privately.
For other product experiences, the Draft PR is generated immediately when a task is assigned, and users can iterate on this “in the open” with the coding agent. This creates more transparency into both the success and failure cases (including logs of the agent sessions for both). This is the case for GitHub Copilot coding agent for example. We believe this “learning in the open” is valuable for individuals, teams, and the industry. But it does lead to the merge rates reported here appearing worse - even if logically they are the same as “task assignment to merged PR” success rates for other tools.
We’re looking forward to continuing to evolve the notion of Draft PR to be even more natural for these use cases. And to enabling all of these coding agents to benefit from open collaboration on GitHub.
We are looking into paths where we can support this more personal/private kind of PR, which would provide the foundation within GitHub to support the best of both worlds here.
But you could probably filter this a bit by looking at PR commit counts?
https://play.clickhouse.com/play?user=play#V0lUSCByZXBvX3N0Y...
I've also added some less popular agents like jetbrains-junie, and added a link to a random pull request for each agent, so we can look at the example PRs.
zachlatta•5h ago