frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Launch HN: Canary (YC W26) – AI QA that understands your code

8•Visweshyc•1h ago
Hey HN! We're Aakash and Viswesh, and we're building Canary (https://www.runcanary.ai). We build AI agents that read your codebase, figure out what a pull request actually changed, and generate and execute tests for every affected user workflow.

Aakash and I previously built AI coding tools at Windsurf, Cognition, and Google. AI tools were making every team faster at shipping, but nobody was testing real user behavior before merge. PRs got bigger, reviews still happened in file diffs, and changes that looked clean broke checkout, auth, and billing in production. We saw it firsthand. We started Canary to close that gap. Here's how it works:

Canary starts by connecting to your codebase and understands how your app is built: routes, controllers, validation logic. You push a PR and Canary reads the diff, understands the intent behind the changes, then generates and runs tests against your preview app checking real user flows end to end. It comments directly on the PR with test results and recordings showing what changed and flagging anything that doesn't behave as expected. You can also trigger specific user workflow tests via a PR comment.

Beyond PR testing, tests generated from the PR can be moved into regression suites. You can also create tests by just prompting what you want tested in plain English. Canary generates a full test suite from your codebase, schedules it, and runs it continuously. One of our construction tech customers had an invoicing flow where the amount due drifted from the original proposal total by ~$1,600. Canary caught the regression in their invoice flow before release.

This isn't something a single family of foundation models can do on its own. QA spans across many modalities like source code, DOM/ARIA, device emulators, visual verifications, analyzing screen recordings, network/console logs, live browser state etc. for any single model to be specialized in. You also need custom browser fleets, user sessions, ephemeral environments, on-device farms and data seeding to run the tests reliably. On top of that, catching second-order effects of code changes requires a specialized harness that breaks the application in multiple possible ways across different types of users that a normal happy path testing flow wouldn't.

To measure how well our purpose built QA agent works, we published QA-Bench v0, the first benchmark for code verification. Given a real PR, can an AI model identify every affected user workflow and produce relevant tests? We tested our purpose-built QA agent against GPT 5.4, Claude Code (Opus 4.6), and Sonnet 4.6 across 35 real PRs on Grafana, Mattermost, Cal.com, and Apache Superset on three dimensions: Relevance, Coverage, and Coherence. Coverage is where the gap was largest. Canary leads by 11 points over GPT 5.4, 18 over Claude Code, and 26 over Sonnet 4.6. For full methodology and per-repo breakdowns give our benchmark report a read: https://www.runcanary.ai/blog/qa-bench-v0

You can check out the product demo here: https://youtu.be/NeD9g1do_BU

We'd love feedback from anyone working on code verification or thinking about how to measure this differently.

Comments

warmcat•40m ago
Good work. But what makes this different than just another feature in Gemini Code assist or Github copilot?
blintz•37m ago
I really want automated QA to work better! It's a great thing to work on.

Some feedback:

- I definitely don't want three long new messages on every PR. Max 1, ideally none? Codex does a great job just using emoji.

- The replay is cool. I don't make a website, so maybe I'm not the target market, but I'd like QA for our backend.

- Honestly, I'd rather just run a massive QA run every day, and then have any failures bisected, rather than per-PR.

- I am worried that there's not a lot of value beyond the intelligence of the foundation models here.

Visweshyc•26m ago
Thanks for the feedback! - Agreed that the form factor can be condensed with a link to detailed information - With the codebase understanding, backend is where we are looking to expand and provide value - The intelligence of the models does lay out the foundation but combining the strength of these models unlocks a system of specialized agents that each reason about the codebase differently to catch the unknown unknowns
Bnjoroge•1m ago
Agree on your last point and it's going to be a very bitter lesson. In any case, you probably wanna shift alot of the code verification as left as possible so doing review at PR time isnt the right strat imo. And claude/codex are well positioned to do the local review.
solfox•35m ago
Looks interesting! Looks like perhaps no support for Flutter apps yet?
Visweshyc•8m ago
Yes we currently support web apps but plan to extend the foundation to test mobile applications on device emulators
solfox•33m ago
Not a direct competitor but another YC company I use and enjoy for PR reviews is cubic.dev. I like your focus on automated tests.
Visweshyc•20m ago
Thanks! We believe executing the scenarios and showing what actually broke closes the loop
Bnjoroge•3m ago
what kinds of tests does it generate and how's this different from the tens of code review startups out there?

Developer Spotlight: Somtochi Onyekwere from Fly.io

https://theconsensus.dev/p/2026/03/19/developer-spotlight-somtochi-onyekwere.html
1•eatonphil•1m ago•0 comments

Workflow Guardian – a GitHub Action that lints your CI/CD workflow files

https://github.com/marketplace/actions/workflow-guardian
1•hnollie89•1m ago•0 comments

New hard science fiction novel

2•dufbugderopa•3m ago•0 comments

Claude Code's System Prompt

https://www.claudecodecamp.com/p/inside-claude-code-s-system-prompt
2•aray07•6m ago•0 comments

Show HN: Local Document Parsing for Agents

https://www.llamaindex.ai/blog/liteparse-local-document-parsing-for-ai-agents
5•cheesyFish•7m ago•0 comments

Why aren't AI productivity gains higher?

https://newsletter.getdx.com/p/why-arent-ai-productivity-gains-higher
2•romanhn•10m ago•0 comments

The Long Farewell to Mark Zuckerberg's Metaverse

https://www.nytimes.com/2026/03/19/technology/mark-zuckerbergs-metaverse-vr-horizon-worlds.html
4•SLHamlet•10m ago•0 comments

Microsoft Seeks More Coherence in AI Efforts with Copilot Reorganization

https://www.wsj.com/tech/ai/microsoft-seeks-more-coherence-in-ai-efforts-with-copilot-reorganizat...
1•gmays•11m ago•0 comments

I built a direct-to-buyer store for industrial floor scrubbers and pallet jacks

https://sunmaxus.com/
1•Cleaninguy•11m ago•1 comments

Show HN: Oku – One tab to filter out noise from feeds and content sources

https://oku.io
2•oan•11m ago•0 comments

UK's Ofcom has today fined 4chan £450k for not having age checks in place

https://www.ofcom.org.uk/online-safety/illegal-and-harmful-content/4chan-fined-450000-for-not-pro...
4•longislandguido•12m ago•1 comments

Search for Golf Shots From The Masters 1968-2025

https://www.masters.com/en_US/vault/index.html
1•kyleblarson•12m ago•0 comments

An update on Steam / GOG changes for OpenTTD

https://www.openttd.org/news/2026/03/19/steam-changes-update
2•jandeboevrie•12m ago•0 comments

There Is No Firewall for English

https://openguard.sh/blog/english-firewall/
3•everlier•13m ago•0 comments

Three Thoughts on Dark Code

https://blog.waleson.com/2026/03/three-thoughts-on-dark-code.html
3•jtwaleson•15m ago•0 comments

I Can't Stop Running Claude Code Sessions

https://www.claudecodecamp.com/p/i-take-my-laptop-to-the-gym-so-claude-doesn-t-have-downtime
2•aray07•16m ago•0 comments

Composer 2 is now available in Cursor

https://twitter.com/cursor_ai/status/2034668943676244133
6•frankfrank13•17m ago•0 comments

Rowan County Chair Engages with Citizens Against AI DC Project

https://www.salisburypost.com/2026/03/19/rowan-county-chair-edds-engages-with-citizens-against-da...
1•gz5•19m ago•1 comments

Why I Used a Broken Laptop Instead of Buying a Mac Mini

https://medium.com/seeds-for-the-future/how-a-broken-laptop-saved-me-from-buying-a-mac-mini-fa169...
1•hungryclaw•20m ago•0 comments

How to Not Pay Your Taxes

https://taylor.town/succession-000
2•surprisetalk•23m ago•0 comments

Android: Balancing Openness and Choice with Safety

https://android-developers.googleblog.com/2026/03/android-developer-verification.html
4•0xedb•23m ago•0 comments

Offload: Speed up the agent loop by running tests remotely

https://imbue.com/product/offload/
6•nvader•24m ago•2 comments

Arizona charges Kalshi, alleging illegal gambling with election bets

https://www.theguardian.com/business/2026/mar/17/kalshi-arizona-gambling-election-bets
2•gmays•24m ago•0 comments

iOS Exploit Chain Adopted by Multiple Threat Actors

https://cloud.google.com/blog/topics/threat-intelligence/darksword-ios-exploit-chain
2•blacktulip•25m ago•0 comments

Show HN: Draft0 – Watch autonomous AI agents debate truth, no humans

https://humans.draft0.io
1•vignesh865•25m ago•0 comments

COBE: The 5KB WebGL globe

https://cobe.vercel.app/
2•bpierre•25m ago•0 comments

Mnemos – Open-source memory layer with typed conflict resolution for AI agents

https://github.com/Sohamp2809/mnemos
1•Soham2809•26m ago•0 comments

Show HN: BamBuddy – a self-hosted print archive for Bambu Lab 3D printers

https://bambuddy.cool
2•maziggy•26m ago•0 comments

Visitran: Agentic Pythonic data transformation platform(AGPL)

https://github.com/Zipstack/visitran
4•constantinum•27m ago•0 comments

Apple Is Way Behind in AI–and Still Making a Fortune from It

https://www.wsj.com/tech/ai/apple-ai-subscriptions-strategy-7ce4ba7f
2•laurex•27m ago•1 comments