frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Launch HN: Jazzberry (YC X25) – AI agent for finding bugs

30•MarcoDewey•7h ago
Hey HN! We are building Jazzberry (https://jazzberry.ai), an AI bug finder that automatically tests your code when a pull request occurs to find and flag real bugs before they are merged.

Here’s a demo video: https://www.youtube.com/watch?v=L6ZTu86qK8U#t=7

We are building Jazzberry to help you find bugs in your code base. Here’s how it works:

When a PR is made, Jazzberry clones the repo into a secure sandbox. The diff from the PR is provided to the AI agent in its context window. In order to interact with the rest of the code base, the AI agent has the ability to execute bash commands within the sandbox. The output from those commands is fed back into the agent. This means that the agent can do things like read/write files, search, install packages, run interpreters, execute code, and so on. It observes the outcomes and iteratively tests to pinpoint bugs, which are then reported back in the PR as a markdown table.

Jazzberry is focused on dynamically testing your code in a sandbox to confirm the presence of real bugs. We are not a general code review tool, our only aim is to provide concrete evidence of what's broken and how.

Here are some real examples of bugs that we have found so far.

“Authentication Bypass (Critical)” - When `AUTH_ENABLED` is `False`, the `get_user` dependency in `home/api/deps.py` always returns the first superuser, bypassing authentication and potentially leading to unauthorized access. Additionally, it defaults to superuser when the authenticated auth0 user is not present in the database.

“Insecure Header Handling (High)” - The server doesn't validate header names/values, allowing injection of malicious headers, potentially leading to security issues.

“API Key Leakage (High)” - Different error messages in browser console logs revealed whether API keys were valid, allowing attackers to brute force valid credentials by distinguishing between format errors and authorization errors.

Working on this, we've realized just how much the rise of LLM-generated code is amplifying the need for better automated testing solutions. Traditional code coverage metrics and manual code review are already becoming less effective when dealing with thousands of lines of LLM-generated code. We think this is going to get more so over time—the complexity of AI-authored systems will ultimately require even more sophisticated AI tooling for effective validation.

Our backgrounds: Mateo has a PhD in reinforcement learning and formal methods with over 20 publications and 350 citations. Marco holds an MSc in software testing, specializing in LLMs for automated test generation.

We are actively building and would love your honest feedback!

Comments

bigyabai•7h ago
> Jazzberry is focused on dynamically testing your code in a sandbox to confirm the presence of real bugs.

That seems like a waste of resources to perform a job that a static linter could do in nanoseconds. Paying to spin up a new VM for every test is going to incur a cost penalty that other competitors can skip entirely.

MarcoDewey•7h ago
You are right that static linters are incredibly fast and efficient for catching certain classes of issues.

Our focus with the dynamic sandbox execution is aimed at finding bugs that are much harder for static analysis to detect. These are bugs like logical flaws in specific execution paths and unexpected interactions between code changes.

winwang•5h ago
Do you guide the LLM to do this specifically? So it doesn't "waste" time on what can be taken care of by static analysis? Would be interesting if you could also integrate traditional analysis tools pre-LLM.
bananapub•7h ago
how did and do you validate that this is of any value at all?

how many test cases do you have? how do you score the responses? how do you ensure random changes by the people who did almost all of the work (training models) doesn't wreck your product?

winwang•5h ago
Not the OP but -- I would immediately believe that finding bugs would be a valuable problem to solve. Your questions should probably be answered on an FAQ though, since they are pretty good.
decodingchris•7h ago
Cool demo! You mentioned using a microVM, which I think is Firecracker? And if it is, any issues with it?
mp0000•7h ago
Thanks! We are indeed using Firecracker. No issues so far
sublinear•7h ago
I'm kinda curious how this compares to GitLab's similar offering: https://docs.gitlab.com/user/project/merge_requests/duo_in_m...
mp0000•6h ago
We are laser focused on bug finding, and aren't targeting general code review, like comments on code style and variable names. We also run your code as part of bug finding instead of only having an LLM inspect it
jdefr89•6h ago
Ton of work already being done on this. I am a Vulnerability Researcher @ MIT and I know of a few efforts, just at my lab alone, being worked on. So far nearly everything I have seen seems to do nothing but report false positives. They are missing bugs a fuzzer could have found in minutes. I will be impressed when it finds high severity/exploitable bugs. I think we are a bit too far from that if its achievable though. On the flip side LLMs have been very useful reverse engineering binaries. Binary Ninja w/ Sidekick (their LLM plugin) can recover and name data structures quite well. It saves a ton of time. Also does a decent job providing high level overviews of code...
hanlonsrazor•5h ago
Agree with you on that. There is nothing about LLMs that makes them uniquely suited for bug finding. However, they could excel re:bugs by recovering traces as you say, and taking it one step further, even recommending fixes.
winwang•5h ago
One possibility is crafting (somewhat-)minimal reproductions. There's some work in the FP community to do this via traditional techniques, but they seem quite limited.
MarcoDewey•3h ago
Correct, what is unique about LLMs is their ability to match an existing tool or practice to a unique problem.
MarcoDewey•3h ago
I definitely agree that there's a lot of research happening in this space, and the false positive issue is a significant hurdle. From my own research and experimentation, I have also seen how challenging it is to get LLM-powered tools to consistently find real.

Our approach with Jazzberry is specifically focused on the dynamic execution aspect within the PR context. I am seeing that by actually running the code with the specific changes, we can get a clearer signal about functional errors. We're very aware of the need to demonstrate our ability to find those high-severity/exploitable bugs you mentioned, and that's a key metric for us as we continue to develop it.

Given your background, I'd be really interested to hear if you have any thoughts on what approaches you think might be most promising for moving beyond the false positive problem in AI-driven bug finding. Any insights from your work at MIT would be incredibly valuable.

mp0000•3h ago
We largely agree, we don't think pure LLM-based approaches are sufficient. Having an LLM automatically orchestrate tools, like a software fuzzer, is something we've been thinking about for a while and we view incorporating code execution as the first step.

We think that LLMs are able to capture semantic bugs that traditional software testing cannot find in a hands-off way, and ideas from both worlds will be needed for a truly holistic bug finder.

ArnavAgrawal03•2h ago
I've used your product and particularly like that you show bugs in a table instead of littering my entire PR.

Does Jazzberry run on the entire codebase, or does it look at the specific PR? Would also like some more details about the tool - it seems much faster than others I've tried - are you using some kind of smaller fine-tuned model?

mp0000•1h ago
Thank you! This is definitely a design choice we believe strongly in.

Jazzberry starts by viewing the patch associated with the PR, but can choose to view any file in the repo as it searches for bugs. All the models in use are off-the-shelf for now, but we have been thinking about training a model from the beginning, stay tuned!

AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms

https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/
602•Fysi•8h ago•164 comments

Show HN: Muscle-Mem, a behavior cache for AI agents

https://github.com/pig-dot-dev/muscle-mem
111•edunteman•4h ago•28 comments

Migrating to Postgres

https://engineering.usemotion.com/migrating-to-postgres-3c93dff9c65d
38•shenli3514•2h ago•5 comments

What is HDR, anyway?

https://www.lux.camera/what-is-hdr/
480•_kush•10h ago•242 comments

Why agency and cognition are fundamentally not computational

https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2024.1362658/full
7•Fibra•35m ago•0 comments

Show HN: Semantic Calculator (king-man+woman=?)

https://calc.datova.ai
67•nxa•3h ago•91 comments

A server that wasn't meant to exist

https://it-notes.dragas.net/2025/05/13/the_server_that_wasnt_meant_to_exist/
232•jaypatelani•7h ago•56 comments

Git Bug: Distributed, Offline-First Bug Tracker Embedded in Git, with Bridges

https://github.com/git-bug/git-bug
143•stefankuehnel•1d ago•54 comments

Variadic Switch

https://pydong.org/posts/variadic-switch/
17•Tsche•1d ago•1 comments

Our narrative prison

https://aeon.co/essays/why-does-every-film-and-tv-series-seem-to-have-the-same-plot
107•anarbadalov•7h ago•101 comments

StackAI (YC W23) Is Hiring Pydantic and FastAPI Wizard

https://www.ycombinator.com/companies/stackai/jobs/8nYnmlN-backend-engineer
1•baceituno•2h ago

Smalltalk-78 Xerox NoteTaker in-browser emulator

https://smalltalkzoo.thechm.org/users/bert/Smalltalk-78.html
64•todsacerdoti•6h ago•26 comments

Getting Started with Celtic Coins – Crude and Barbarous, or Just Different?

https://collectingancientcoins.co.uk/getting-started-with-celtic-coins-crude-and-barbarous-or-just-different/
17•jstrieb•3d ago•4 comments

Changes since congestion pricing started in New York

https://www.nytimes.com/interactive/2025/05/11/upshot/congestion-pricing.html
157•Vinnl•1d ago•172 comments

The cryptography behind passkeys

https://blog.trailofbits.com/2025/05/14/the-cryptography-behind-passkeys/
151•tatersolid•12h ago•124 comments

Databricks and Neon

https://www.databricks.com/blog/databricks-neon
254•davidgomes•13h ago•179 comments

Hegel 2.0: The imaginary history of ternary computing (2018)

https://www.cabinetmagazine.org/issues/65/weatherby.php
16•Hooke•2d ago•1 comments

Bus stops here: Shanghai lets riders design their own routes

https://www.sixthtone.com/news/1017072
438•anigbrowl•19h ago•311 comments

The AUCTUS A6: the chip enabling inexpensive DMR Radio (2021)

https://jhart99.com/auctus-a6/
8•walterbell•3d ago•4 comments

The recently lost file upload feature in the Nextcloud app for Android

https://nextcloud.com/blog/nextcloud-android-file-upload-issue-google/
362•morsch•18h ago•130 comments

UK's Ancient Tree Inventory

https://ati.woodlandtrust.org.uk/
48•thinkingemote•13h ago•50 comments

How the economics of multitenancy work

https://www.blacksmith.sh/blog/the-economics-of-operating-a-ci-cloud
133•tsaifu•10h ago•27 comments

Perverse incentives of vibe coding

https://fredbenenson.medium.com/the-perverse-incentives-of-vibe-coding-23efbaf75aee
127•laurex•4h ago•125 comments

Updated rate limits for unauthenticated requests

https://github.blog/changelog/2025-05-08-updated-rate-limits-for-unauthenticated-requests/
46•xena•5d ago•57 comments

Launch HN: Jazzberry (YC X25) – AI agent for finding bugs

30•MarcoDewey•7h ago•17 comments

An accessibility update – GTK Development Blog

https://blog.gtk.org/2025/05/12/an-accessibility-update/
55•todsacerdoti•1d ago•12 comments

Interferometer Device Sees Text from a Mile Away

https://physics.aps.org/articles/v18/99
184•bookofjoe•4d ago•49 comments

How to Build a Smartwatch: Picking a Chip

https://ericmigi.com/blog/how-to-build-a-smartwatch-picking-a-chip/
216•rcarmo•16h ago•96 comments

Show HN: Lumier – Run macOS VMs in a Docker

https://github.com/trycua/cua/tree/main/libs/lumier
106•GreenGames•8h ago•36 comments

We Made CUDA Optimization Suck Less

https://www.rightnowai.co/
33•jaberjaber23•1d ago•6 comments