frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Coding agent where a second agent QAs every PR in a real browser

https://www.notesasm.com/
1•kavin_key•35m ago
Hi HN. I've been building this for the last few months and it's at a state where outside eyes would help more than another week of solo iteration.

It's a kanban board where each ticket runs two agents back to back:

Build agent: runs in a sandboxed temp dir against a shallow clone of the user's repo, makes the change, pushes a branch, opens a PR. Uses the Claude Agent SDK.

QA agent: waits for the preview deploy to come up, then drives a real browser via Browserbase against the preview and verifies the change works against the ticket's acceptance criteria. Screenshots and an mp4 of the QA session get attached to the PR.

If QA fails, the build agent reruns with the QA report as context, up to 3 iterations. Before each retry, a classifier reads the failure and decides whether it was a real code bug or environmental (Clerk didn't load, preview never deployed, Browserbase session got 403'd, etc). Environmental failures break the loop instead of iterating on infra noise. This was the single biggest reliability win.

The other side is input. The platform exposes an MCP server, so from Claude Code or any MCP client you can say "make a ticket for X" and it lands in the backlog. The original reason I built any of this was that writing tickets was the bottleneck for me, not writing code.

A few implementation notes that might be interesting:

The build agent's system prompt forbids the Task / Agent (subagent) tool. Spawning subagents inside the SDK was hanging for 4+ minutes consistently. Staying in the main session with Read/Edit/Bash/Glob/Grep is dramatically more reliable.

Postgres schema is applied on startup from a single schema.sql, idempotent with IF NOT EXISTS everywhere. No migrations directory. Adding a column is "edit schema.sql, push, restart." This is the highest-leverage decision I've made on the backend.

QA has a fast mode (local Chromium for anonymous routes) and a deep mode (Browserbase + residential proxies + stealth, for anything behind auth). The mode is per-ticket because cheap-and-fast loses signal once you go past the login wall.

A background sweeper force-fails any job running over 60 min. The SDK can hang in ways asyncio.wait_for doesn't always clean up through the subprocess boundary, so the kill switch is a belt-and-suspenders guard.

Stack: FastAPI on Railway, Postgres, Claude Agent SDK, Browserbase, Vercel for previews, Clerk for auth, Resend for transactional email, MCP over HTTP. Frontend is one HTML file on Vercel, no build step, no framework, just vanilla JS and Clerk loaded from CDN.

What's not working well yet: deep-mode QA still occasionally gets stuck on CAPTCHAs in unfamiliar OAuth flows. The classifier's environmental-failure list is hand-curated keywords, which is fragile. Spend tracking is per-job but I haven't built per-workspace budget caps yet. PR previews on Vercel sometimes take 2-3 min to come up which the QA agent has to wait through.

It's in alpha with a waitlist. Free during alpha, paid plans later. The whole platform was built using Claude Code, so this has been dogfooding itself for the entire build.

Site: https://notesasm.com

Would love feedback, especially on: the dual-agent loop design, the classifier approach, what kinds of tickets would actually break this on your repo, and prior art I should be aware of (I know about Devin, OpenHands, SWE-agent; what else?).

Sony's 1000X the ColleXion Headphones Make the AirPods Max 2 Look Affordable

https://gizmodo.com/sonys-1000x-the-collexion-wireless-headphones-make-the-airpods-max-2-look-aff...
1•giwook•30s ago•0 comments

Edmund Phelps, Who Upended the Way We View Inflation, Dies at 92

https://www.nytimes.com/2026/05/18/business/economy/edmund-phelps-dead.html
1•paulpauper•1m ago•0 comments

AI-written story published in Granta, wins major literary prize

https://twitter.com/ATabarrok/status/2056399339174469827
1•paulpauper•1m ago•0 comments

SaaSpocalypse now? These founders don't think so

https://betakit.com/saaspocalypse-now-these-founders-dont-think-so/
1•builtbystef•2m ago•0 comments

WTF happened to Claude Code ext in antigravity?

https://antigravity.google/docs/ide-getting-started?app=antigravity
1•ajalexander•3m ago•0 comments

Is It Worth Investing in Index Funds? What 90% of Investors Get Wrong About Fees

https://comuniq.xyz/post?t=1109
1•01-_-•3m ago•0 comments

Ask HN: Is there a good code intelligence MCP server yet?

1•fariswyatt•6m ago•0 comments

Dumb Ways for an Open Source Project to Die

https://nesbitt.io/2026/05/19/dumb-ways-for-an-open-source-project-to-die.html
2•chmaynard•7m ago•0 comments

The Relevance of BPMN in the Age of AI

https://medium.com/spiffworkflow/the-relevance-of-bpmn-in-the-age-of-ai-1f28440440e5
1•danfunk•8m ago•1 comments

We Got Lost in AI

https://hmmr.online/posts/we-got-lost/
1•ZanderHammer•8m ago•0 comments

ZPL – a deterministic engine for binary-matrix bias scoring

https://zeropointlogic.io
1•cicicalex•10m ago•1 comments

Spring Cleaning

https://blog.bayindirh.io/blog/spring-cleaning/
1•bayindirh•11m ago•0 comments

Mistral AI Acquires Emmi AI to Create the Leading AI Stack

https://www.emmi.ai/news/mistral-ai-acquires-emmi-ai
2•doener•15m ago•0 comments

Minnesota becomes first state to ban prediction markets

https://www.npr.org/2026/05/19/nx-s1-5821265/minnesota-ban-prediction-markets
2•ortusdux•16m ago•0 comments

The State of Statefulness in AI Agents

https://twitter.com/yoheinakajima/status/2056598291316634079
1•programmarchy•17m ago•0 comments

Tesla (TSLA) is building its giant solar panel factory in Houston

https://electrek.co/2026/05/19/tesla-tsla-solar-panel-factory-houston-brookshire-100-gw/
1•dotcoma•18m ago•0 comments

QUIC has a lot going for it, but it is a large library (six figure LoC)

https://twitter.com/ID_AA_Carmack/status/2056780156535279812
1•tosh•20m ago•0 comments

Unlocking Asynchronicity in Continuous Batching

https://huggingface.co/blog/continuous_async
1•gmays•20m ago•0 comments

Tools to understand how content was created and edited

https://blog.google/innovation-and-ai/products/identifying-ai-generated-media-online/
1•7777777phil•21m ago•0 comments

Big Tech's AI Trap

https://crib.social/notice/B6SoGrSEip75oKAcGO
2•gslepak•21m ago•0 comments

Depression linked to bacterium-chemical interaction in personal care products

https://tech-paper.com/new-research-found-that-depression-may-begin-in-your-gut-when-a-common-bac...
1•cachecrab•22m ago•0 comments

The Sunk Cost Fallacy and How It Influences Our Decisions

https://almossawi.substack.com/p/the-sunk-cost-fallacy
1•anarbadalov•23m ago•0 comments

Andrej Karpathy Joins Anthropic

https://www.thevccorner.com/p/breaking-andrej-karpathy-joins-anthropic
2•vinni2•24m ago•0 comments

Google Antigravity CLI

https://antigravity.google/blog/introducing-google-antigravity-cli
2•jbirnick•25m ago•0 comments

Google introduces Gemini Spark, a 24/7 agentic assistant with Gmail integration

https://techcrunch.com/2026/05/19/google-introduces-gemini-spark-a-24-7-agentic-assistant-with-gm...
1•gfortaine•26m ago•0 comments

Show HN: Logbox – let Claude monitor your dev logs

https://github.com/struct-dot-ai/logbox
2•nimeshmc•26m ago•0 comments

Likely AI-generated short story won a major prize

https://twitter.com/nabeelqu/status/2056397504824963296
2•thatoneengineer•26m ago•0 comments

Show HN: Melogen – Generate MIDI melodies for free

https://www.melogen.ai/
1•squirrelon•29m ago•0 comments

Show HN: FastBack end – schema-first back end runtime with OpenAPI output

https://github.com/darula-hpp/fastbackend
1•ombedzi•29m ago•0 comments

The Gemini app becomes more agentic, delivering proactive, 24/7 help

https://blog.google/innovation-and-ai/products/gemini-app/next-evolution-gemini-app/
2•gfortaine•32m ago•0 comments