Show HN: I built a tool to assist AI agents to know when a PR is good to go

https://dsifry.github.io/goodtogo/

45•dsifry•3w ago

I've been using Claude Code heavily, and kept hitting the same issue: the agent would push changes, respond to reviews, wait for CI... but never really know when it was done.

It would poll CI in loops. Miss actionable comments buried among 15 CodeRabbit suggestions. Or declare victory while threads were still unresolved.

The core problem: no deterministic way for an agent to know a PR is ready to merge.

So I built gtg (Good To Go). One command, one answer:

$ gtg 123 OK PR #123: READY CI: success (5/5 passed) Threads: 3/3 resolved

It aggregates CI status, classifies review comments (actionable vs. noise), and tracks thread resolution. Returns JSON for agents or human-readable text.

The comment classification is the interesting part — it understands CodeRabbit severity markers, Greptile patterns, Claude's blocking/approval language. "Critical: SQL injection" gets flagged; "Nice refactor!" doesn't.

MIT licensed, pure Python. I use this daily in a larger agent orchestration system — would love feedback from others building similar workflows.

Comments

mcolley•3w ago

Super interesting, any particular reason you didn't try to solve these prior to pushing with hooks and subagents?

dsifry•3w ago

I did! The issue however, is having a clear, deterministic method of defining when the code review was 'done'. So the hooks can fire off subagents, but they are non-deterministic and often miss vital code review comments - especially ones that are marked in an inline comment, or are marked as 'Out of PR Scope' or 'Out of range of the file' - which are often the MOST important comments to address!

So gtg builds all of that in and deterministically determines whether or not there are any actionable comments, and thus you can block the agent from moving forward until all actionable comments are thoroughly reviewed, acted upon or acknowledged, at which point it will change state and allow the PR to be merged.

blutoot•3w ago

I thought hooks are always fired if you use it as a PreToolUse event. Wouldn’t that work for the GitHub action tools from the GitHub mcp?

dsifry•3w ago

Sure, but that mcp still missed actionable comments that are marked as Out of Scope or Outside the PR - and this doesn't require having the context window loss of having another mcp instantiated, either. Anyway, give gtg a competitive look against the mcp - you should be able to see the difference

dsifry•3w ago

Just to be clear - the hook is deterministic, but the subagent running with an mcp server loaded is not - and for medium/large PRs, it can run out of context window or just forget what it is trying to do and get lazy and say 'Everything is good, ready to merge!' when in fact tests are failing or there are still unaddressed PR comments.

rootnod3•3w ago

Sorry, so the tool is now even circumventing human review? Is that the goal?

So the agent can now merge shit by itself?

Just the let damn thing push nto prod by itself at this point.

ljm•3w ago

Someone’s gonna think about wiring all this up to Linear or Jira, and there’ll be a whole new set of vulnerabilities created from malicious bug reports.

dsifry•3w ago

That's why I intentionally don't have this hooked into an ingest flow - you still get control over what issues/stories you want the agent swarm to work on... Just now, I can know that the code that was written has been reviewed and all comments have been fully addressed!

literalAardvark•3w ago

In some workflows it's helpful for the full loop to be automated so that the agent can test if what's done works.

And you can do a more exhaustive test later, after the agents are done running amok to merge various things.

dsifry•3w ago

Exactly right!

danenania•3w ago

I don’t think “ready to merge” necessarily means the agent actually merges. Just that it’s gone as far as it can automatically. It’s up to you whether to review at that point or merge, depending on the project and the stakes.

If there are CI failures or obvious issues that another AI can identify, why not have the agent keep going until those are resolved? This tool just makes that process more token efficient. Seems pretty useful to me.

dsifry•3w ago

That's EXACTLY right. Ready to merge is an important gate, but it is very stupid to just merge everything without further checks/testing by a human!

forgotpwd16•2w ago

This tool seems agent-oriented for them to merge, rather merely check readiness. In that regard, the page doesn't mention anything about human reviewers, only AI reviewers. Honestly wouldn't be surprised if author, someone seemingly running fully agentic workflows, didn't even consider human reviewers. If it's AI start-to-end*, then yes, quite possibly could push directly to master without much difference.

Call me pessimistic, and considering [1][2][3] (and other similar articles/discussions), believe this tool will be most useful to AI PR spammers the moment is modified to also parse non-AI PR comments.

*Random question: is it start-to-end or end-to-end?

edit: P.S. Agree that it's useful, given its design goals, tool though.

[1]: https://old.reddit.com/r/opensource/comments/1q3f89b/ [2]: https://devansh.bearblog.dev/ai-slop/ [3]: https://etn.se/index.php/nyheter/72808-curl-removes-bug-boun... (currently trending first page)

baxtr•3w ago

I’m not saying this is, but if I were a malicious state actor, that’s exactly the kind of thing I’d like to see in widespread use.

tayo42•3w ago

No,

The linked page explains how this fits into a development workflow

eg.

> A reviewer wrote “consider using X”… is that blocking or just a thought?

> AMBIGUOUS - Needs human judgment (suggestions, questions)

dsifry•3w ago

Right! It doesn't assume that all comments are actionable, or need to be worked on. However, if you allow anyone to comment on your PRs, it could be a malicious vector. So don't let anyone review PRs on projects that you care about!!!

blutoot•3w ago

At a scale, I don't see a net negative of AI merging "shit by itself" if the developer (or the agent) is ensuring sufficient e2e, integration and unit test coverage prior to every merge, if in return I get my team to crank out features at a 10x speed.

The reality is that probably 99.9999% of code bases on this earth (but this might drop soon, who knows) pre-date LLMs and organizing them in a way that coding agents can produce consistent results from sprint to sprint, will need a big plumbing work from all dev teams. And that will include refactoring, documentation improvements, building consensus on architectures and of course reshaping the testing landscape. So SWE's will have a lot of dirty work to do before we reach the aforementioned "scale".

However, a lot of platforms are being built from ground-up today in a post-CC (claude code) era . And they should be ready to hit that scale today.

dsifry•3w ago

Yup! Software engineers aren't going to be out of work anytime soon, but I'm acting more like a CTO or VPE with a team of agents now, rather than just a single dev with a smart intern.

dizhn•3w ago

I am not in the tech field anymore and I use exclusively free models and clis. They are mostly of Chinese origin. I call them my little software sweatshop.

tadfisher•3w ago

I hate this paradigm because it pits me against my tools as if we're adversaries. The tools are prone to rewrite or even delete the tests, so we have to write other tools to sandbox agents from each other and check each others' work, and I just don't see a way to get deterministically good results over just building shit myself. It comes down to needing high trust in my tools to feel confident in what we're shipping.

blutoot•3w ago

The key is that at the end of the day productivity is king which is a polite term for cutting head count and/or delivering at a ridiculously higher velocity.

You can deterministically always get good results at your pace. But most likely, you won't achieve that at the speed and scale that a coding agent running in 4-5 worktrees, 24/7 without food or toilet breaks, especially if the latter will mostly help achieve the product/business goals at an "OK" quality (in which case you will perhaps be measured by how good you can steer these agents to elevate that quality from "OK" without sacrificing scale too much).

dsifry•3w ago

No, it just prepares the PR - it doesn't automatically merge. That would be very dangerous, imho!

glemion43•3w ago

Man if you are so frustrated by AI just stop reading articles relevant to it if you don't even take the time to read it properly.

And yes there are plenty of use cases were ai code doesn't hurt anyone even if it gets merged automatically...

See it as an interesting new field of r&d...

squeaky-clean•3w ago

It sounds like the goal is to get the code to human review without it being obviously broken in CI but the agent has no idea that's the case.

dsifry•2w ago

Yeah, it is about making sure that EVERY actionable PR comment gets addressed - whwther by fixing, resolving, creating a new issue, commenting that it is a will not fix, or blocking for human review - and then giving you a clear deterministic check you can do to reliably enforce your policy.

joshuanapoli•3w ago

This looks nice! I like the idea of providing more deterministic feedback and more or less forcing the assistant to follow a particular development process. Do you have evidence that gtg improves the overall workflow? I think that there is a trade-off between risk of getting stuck (iteration without reaching gtg-green) versus reaching perfect 100% completion.

dsifry•3w ago

I found that it has improved overall code quality significantly, at the cost of somewhat slower velocity. But it has meant fewer interruptions where the ai is just waiting for me, or saying "Everything is ready!" only to find that ci/cd failed or there were clearly existing comments/issues.

joshribakoff•3w ago

I dislike the idea of coupling my workflow to saas platforms like github or code rabbit. The fact that you still have to create local tools is a selling point for just doing it all “locally”.

philipp-gayret•3w ago

Very interesting! This has a gem in the documentation: Using the tool itself as a CI check. I hadn't considered unresolved comments by say a person, or CodeRabbit or similar tool being a CI status failure. That's an excellent idea for AI driven PR's.

On a personal note; I hate LLM output to advertise a project. If you have something to share have the decency to type it out yourself or at least redact the nonsense from it.

dsifry•3w ago

Lol, I thought it did a reasonably good job, but to each their own - this was the difference between releasing the project so others could use it with decent documentation, or not releasing and just using it internally. :)

furyofantares•3w ago

Then you had the LLM write the blog post as well as your post on HN.

nyc1983•3w ago

I don’t understand how this provides anything above using GitHub status checks and branch protections to require conversations to be resolved before merging. Combined with the GitHub CLI, this gives agents everything they need to achieve the same result. More AI slop on top of AI slop. At this point when seeing these kinds of posts I feel like Edward Norton in front of the copy machine.

dsifry•2w ago

Some github comments are marked as sctionable, some have threads and suggestions, some are suggestions or are nitpicks. This provides you with a deterministic, reliable red/green approach that you cn use to enforce your policy. Give it a try and you will see how it is much more reliable than using a nondeterministic agent, especially for complex reviews!

aaronbrethorst•2w ago

“The problem no one talks about” is a bit of breathless LLM spew, and an even better tell than an emdash

forgotpwd16•2w ago

That repo is quintessentially surreal. AI-written code, published in AI-made PRs, reviewed by multiple AI bots (one of which being same model that wrote code & made the PR, maybe others too just accessible via 3rd vendor), merged by AI (assuming dogfooding).

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

Show HN: Craftplan – Elixir-based micro-ERP for small-scale manufacturers

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Show HN: A luma dependent chroma compression algorithm (image compression)

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Show HN: If you lose your memory, how to regain access to your computer?

Show HN: I spent 4 years building a UI design tool with only the features I use

Show HN: Django-rclone: Database and media backups for Django, powered by rclone

Show HN: Witnessd – Prove human authorship via hardware-bound jitter seals

Show HN: Smooth CLI – Token-efficient browser for AI agents

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

Show HN: More beautiful and usable Hacker News

Show HN: PalettePoint – AI color palette generator from text or images

Show HN: Artifact Keeper – Open-Source Artifactory/Nexus Alternative in Rust

Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements

Show HN: Slack CLI for Agents

Show HN: I built a <400ms latency voice agent that runs on a 4gb vram GTX 1650"

Show HN: Gigacode – Use OpenCode's UI with Claude Code/Codex/Amp

Show HN: Stacky – certain block game clone

Show HN: ARM64 Android Dev Kit

Show HN: A toy compiler I built in high school (runs in browser)

Show HN: Micropolis/SimCity Clone in Emacs Lisp

Show HN: Env-shelf – Open-source desktop app to manage .env files

Show HN: Nginx-defender – realtime abuse blocking for Nginx

Show HN: MCP App to play backgammon with your LLM

Show HN: Daily-updated database of malicious browser extensions

Show HN: Horizons – OSS agent execution engine

Show HN: I'm 75, building an OSS Virtual Protest Protocol for digital activism

Show HN: I built Divvy to split restaurant bills from a photo

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

Show HN: Craftplan – Elixir-based micro-ERP for small-scale manufacturers

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Show HN: A luma dependent chroma compression algorithm (image compression)

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Show HN: If you lose your memory, how to regain access to your computer?

Show HN: I spent 4 years building a UI design tool with only the features I use

Show HN: Django-rclone: Database and media backups for Django, powered by rclone

Show HN: Witnessd – Prove human authorship via hardware-bound jitter seals

Show HN: Smooth CLI – Token-efficient browser for AI agents

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

Show HN: More beautiful and usable Hacker News

Show HN: PalettePoint – AI color palette generator from text or images

Show HN: Artifact Keeper – Open-Source Artifactory/Nexus Alternative in Rust

Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements

Show HN: Slack CLI for Agents

Show HN: I built a <400ms latency voice agent that runs on a 4gb vram GTX 1650"

Show HN: Gigacode – Use OpenCode's UI with Claude Code/Codex/Amp

Show HN: Stacky – certain block game clone

Show HN: ARM64 Android Dev Kit

Show HN: A toy compiler I built in high school (runs in browser)

Show HN: Micropolis/SimCity Clone in Emacs Lisp

Show HN: Env-shelf – Open-source desktop app to manage .env files

Show HN: Nginx-defender – realtime abuse blocking for Nginx

Show HN: MCP App to play backgammon with your LLM

Show HN: Daily-updated database of malicious browser extensions

Show HN: Horizons – OSS agent execution engine

Show HN: I'm 75, building an OSS Virtual Protest Protocol for digital activism

Show HN: I built Divvy to split restaurant bills from a photo

Show HN: I built a tool to assist AI agents to know when a PR is good to go

Comments