Show HN: Haystack – Review the PRs that need human attention

10•akshaysg•1d ago

Hey HN! We're building Haystack (https://haystackeditor.com/) to help teams deal with the explosion in the number of pull requests that need to be reviewed due to the rise of coding agents.

Haystack replaces the GitHub PR review system with a queue that triages each PR before a human has to read any diffs. It looks at the diffs, the codebase, and the coding-agent conversation that produced the PR. Haystack then routes it into one of three buckets:

1. Safe to merge. This means the PR has enough evidence behind it that the team can merge it without another human's review.

Some examples:

-- A small UI copy change that includes a screenshot showing the final state

-- A backend change where the author clearly tested the important paths and ran the changes in a real environment

2. Needs fixes. This means that the PR has bugs or violates a rule in your codebase and therefore the PR needs to be fixed by the author.

Some examples:

-- The agent was asked to make loading a large table faster by adding pagination, but the PR still loads every result at once and "implements" pagination in the UI

-- The PR silently catches an error instead of logging, surfacing, or handling it. This violates the team's "no silent error swallowing" rule

3. Needs human review. This means that the PR could not be sufficiently verified by the author or is touching a sensitive part of the codebase (determined by user-input guidelines) and thus requires human review.

Some examples:

-- The PR changes a significant amount of logic in billing

-- The PR changes an important user flow like onboarding, but the author only ran unit tests and never opened the app to check the flow end-to-end. That violates the team's rule that high-impact user-facing changes need manual verification.

Instead of starting with line-by-line diffs, Haystack immediately tells the reviewer the goal behind the PR, what design decisions the author made (informed by their coding-agent conversation), and how much the author did to verify that the pull request works (e.g. run scripts, checked the frontend, etc.).

In this way, review shifts from "what changed?" to "is this the right behavior and is there evidence that it works?".

Here's a quick demo: https://www.tella.tv/video/streamlining-code-reviews-with-ha...

We previously launched Haystack as a tool for understanding large PRs (https://news.ycombinator.com/item?id=45201703). As many of you can probably relate to, the release of Opus 4.5 completely shattered our conception of how fast an engineer could craft a PR.

And as coding agents got even better from 4.5, we realized that pull requests did not scale along with our coding velocity. With each member of our team being able to pump out more than 20 pull requests a day, code review quickly became cognitively exhausting and less helpful.

After talking with other folks, we learned many feel similarly, and currently face the binary option of either not doing review at all or trying to keep up with a fire hose of pull requests.

Haystack is our attempt at a third path. We still believe in code review, but as coding agents produce more code, human reviewer attention becomes more valuable and more expensive.

Haystack helps teams spend that attention on the PRs where a human can meaningfully change the outcome of that PR. And for such PRs, Haystack shows the reviewer what the PR intended to do, whether the author showed that it works, and what design decisions need a second pair of eyes.

We're still quite early and are figuring out whether Haystack truly makes code review better. We would love any and all feedback!

Comments

ramon156•1h ago

I like this idea. To be blunt, would this have more features than hooking up Claude/Gemini/Codex and saying "If - at any point - you're unsure, step back and let a human review"

akshaysg•57m ago

I think Haystack offers:

1. A centralized review mechanism for a team or org that operates on coding agent conversations in addition to diffs (and the codebase). It evaluates multiple different variables (e.g. how sensitive are the changes, how much did the author do to derisk, and what did the author's coding agent gloss over) and helps enforce your team's guidelines moreso than just an individual's prompt

2. Adversarial review that operates in addition to other AI review agents (e.g. BugBot, or Greptile) and filters any comments to only the things the author cares about. This helps cut down on the "AI reviewer battleground" that is present in pull requests

3. A review interface that allows human reviewers to quickly understand what the author did to verify their changes and focus on the author's design decisions

We actually jury-rigged all of this together before building Haystack, but found that it doesn't scale to the team level (since every individual has their own ideas/opinions of what constitutes a human review).

We also found that reviewing through purely Claude Code/Codex was slow and difficult because stuff like author traces are not pre-processed and you have to get your agent to specifically explore/understand them.

oersted•1h ago

Man, it's such a shame that you pivoted away from the canvas-based editor concept, it was such a pleasure to use, it's so much better than tabs.

https://github.com/haystackeditor/haystack-editor

It's still probably the best tool to navigate, visualize and understand complex codebases, which is particularly important now with AI coded repos. I keep looking for alternatives but they are all notably worse.

About a month ago I spent a few frustrating hours building it from source for my system and making it work, and I've enjoyed using it as my main IDE since.

I wish I had the time to make a fork and bring in a newer version of VSCode. If anyone takes it up I might help at least.

akshaysg•55m ago

Yeah I agree it's a shame. Unfortunately coding has changed fast and I was not confident that the editor was in the correct direction with AI coding becoming so prevalent.

I think there is a lot of value with "reconnecting" with your codebase, so I do have some plans to bring the core concept of Haystack back in one form or another.

softwaredoug•39m ago

Just to say great idea. But the name "Haystack" is used by several dozen things FWIW :)

Show HN: Superlog (YC P26) – Observability that installs itself and fixes bugs

Show HN: I made a 3D pose maker for artists

Show HN: Haystack – Review the PRs that need human attention

Show HN: audio.observer – AI news jingles you didn’t ask for

Show HN: Number Gacha, a gacha game distilled to its essence

Show HN: Hsrs – Type-Safe Haskell Bindings Generator for Rust

Show HN: Files.md – Open-source alternative to Obsidian

Show HN: Autodidact – Self-evolving local-first AI agent

Show HN: Gpubook – An order book for GPU compute

Show HN: InsForge – Open-source Heroku for coding agents

Show HN: Noxu DB, a Rust Port of Berkeley DB Java Edition

Show HN: We missed Winamp, so we built an audio player for macOS

Show HN: Clark-Browser – Stealth Chromium

Show HN: Semble – Code search for agents that uses 98% fewer tokens than grep

Show HN: Resilient, A composable async resilience toolkit for rust

Show HN: Mezz, a curl-able WiFi sandbox for IoT pentesting

Show HN: Auto-identity-remove – Automated data broker opt-out runner for macOS

Show HN: Spud – cross-platform remote control, optimised for gaming

Show HN: Rocksky – Music scrobbling and discovery on the AT Protocol

Show HN: Closed Rings – A CLI-first time tracker for developers

Show HN: MyUUIDshop, Generate UUIDs and never worry about duplicates

Show HN: FlashAttention-2 in Cute, from Scratch

Show HN: Watch a neural net learn to play Snake

Show HN: Id-agent – Token efficient UUID alternative for AI agents

Show HN: Vecdb – local-first hybrid vector database in Rust (HNSW and BM25)

Show HN: Handoff – preserve coding context when agents run out of tokens

Show HN: Clawputer – A personal AI assistant with a real computer and memory

Show HN: Better.ftp – cycling app for FTP tests without subscription

Show HN: Tracecast – open-source generative data apps built on top of Marimo

Show HN: Epiq – Distributed Git based issue tracker TUI

Show HN: Haystack – Review the PRs that need human attention

Comments

Show HN: Superlog (YC P26) – Observability that installs itself and fixes bugs

Show HN: I made a 3D pose maker for artists

Show HN: Haystack – Review the PRs that need human attention

Show HN: audio.observer – AI news jingles you didn’t ask for

Show HN: Number Gacha, a gacha game distilled to its essence

Show HN: Hsrs – Type-Safe Haskell Bindings Generator for Rust

Show HN: Files.md – Open-source alternative to Obsidian

Show HN: Autodidact – Self-evolving local-first AI agent

Show HN: Gpubook – An order book for GPU compute

Show HN: InsForge – Open-source Heroku for coding agents

Show HN: Noxu DB, a Rust Port of Berkeley DB Java Edition

Show HN: We missed Winamp, so we built an audio player for macOS

Show HN: Clark-Browser – Stealth Chromium

Show HN: Semble – Code search for agents that uses 98% fewer tokens than grep

Show HN: Resilient, A composable async resilience toolkit for rust

Show HN: Mezz, a curl-able WiFi sandbox for IoT pentesting

Show HN: Auto-identity-remove – Automated data broker opt-out runner for macOS

Show HN: Spud – cross-platform remote control, optimised for gaming

Show HN: Rocksky – Music scrobbling and discovery on the AT Protocol

Show HN: Closed Rings – A CLI-first time tracker for developers

Show HN: MyUUIDshop, Generate UUIDs and never worry about duplicates

Show HN: FlashAttention-2 in Cute, from Scratch

Show HN: Watch a neural net learn to play Snake

Show HN: Id-agent – Token efficient UUID alternative for AI agents

Show HN: Vecdb – local-first hybrid vector database in Rust (HNSW and BM25)

Show HN: Handoff – preserve coding context when agents run out of tokens

Show HN: Clawputer – A personal AI assistant with a real computer and memory

Show HN: Better.ftp – cycling app for FTP tests without subscription

Show HN: Tracecast – open-source generative data apps built on top of Marimo

Show HN: Epiq – Distributed Git based issue tracker TUI