frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Maybe the default settings are too high

https://www.raptitude.com/2025/12/maybe-the-default-settings-are-too-high/
328•htk•5h ago•100 comments

MiniMax M2.1: Built for Real-World Complex Tasks, Multi-Language Programming

https://www.minimaxi.com/news/minimax-m21
61•110•3h ago•25 comments

WiFi DensePose: WiFi-based dense human pose estimation system through walls

https://github.com/ruvnet/wifi-densepose
21•nateb2022•1h ago•3 comments

Show HN: Gaming Couch – a local multiplayer party game platform for 8 players

https://gamingcouch.com
34•ChaosOp•4d ago•2 comments

Tiled Art

https://tiled.art/en/home/?id=SilverAndGold
46•meander_water•6d ago•0 comments

Python 3.15’s interpreter for Windows x86-64 should hopefully be 15% faster

https://fidget-spinner.github.io/posts/no-longer-sorry.html
342•lumpa•15h ago•111 comments

When a driver challenges the kernel's assumptions

http://miod.online.fr/software/openbsd/stories/udl.html
36•todsacerdoti•3h ago•4 comments

Fahrplan – 39C3

https://fahrplan.events.ccc.de/congress/2025/fahrplan/
162•rurban•9h ago•20 comments

The entire New Yorker archive is now digitized

https://www.newyorker.com/news/press-room/the-entire-new-yorker-archive-is-now-fully-digitized
366•thm•5d ago•51 comments

Paperbacks and TikTok

https://calnewport.com/on-paperbacks-and-tiktok/
91•zdw•3d ago•54 comments

CUDA Tile Open Sourced

https://github.com/NVIDIA/cuda-tile
158•JonChesterfield•6d ago•65 comments

Seven Diabetes Patients Die Due to Undisclosed Bug in Abbott's Glucose Monitors

https://sfconservancy.org/blog/2025/dec/23/seven-abbott-freestyle-libre-cgm-patients-dead/
113•pabs3•3h ago•42 comments

Asahi Linux with Sway on the MacBook Air M2 (2024)

https://daniel.lawrence.lu/blog/2024-12-01-asahi-linux-with-sway-on-the-macbook-air-m2/
198•andsoitis•14h ago•190 comments

Archiving Git branches as tags

https://etc.octavore.com/2025/12/archiving-git-branches-as-tags/
93•octavore•3d ago•24 comments

The Program 2025 annual review: How much money does an audio drama podcast make?

https://programaudioseries.com/the-program-results-7/
51•I-M-S•3d ago•15 comments

I sell onions on the Internet (2019)

https://www.deepsouthventures.com/i-sell-onions-on-the-internet/
397•sogen•12h ago•121 comments

Critical vulnerability in LangChain – CVE-2025-68664

https://cyata.ai/blog/langgrinch-langchain-core-cve-2025-68664/
80•shahartal•10h ago•56 comments

Google is 'gradually rolling out' option to change your gmail.com address

https://9to5google.com/2025/12/24/google-change-gmail-addresses/
154•geox•6h ago•127 comments

Show HN: Lamp Carousel – DIY kinetic sculpture powered by lamp heat (2024)

https://evan.widloski.com/posts/spinners/
68•Evidlo•1d ago•13 comments

Ask HN: What skills do you want to develop or improve in 2026?

25•meridion•12h ago•29 comments

Reinventing the dial-up modem (2019)

https://saket.me/dtmf-tones/
11•todsacerdoti•6d ago•5 comments

Toys with the highest play-time and lowest clean-up-time

https://joannabregan.substack.com/p/toys-with-the-highest-play-time-and
320•surprisetalk•1w ago•169 comments

We invited a man into our home at Christmas and he stayed with us for 45 years

https://www.bbc.co.uk/news/articles/cdxwllqz1l0o
967•rajeshrajappan•17h ago•235 comments

Alzheimer’s disease can be reversed in animal models? Study

https://case.edu/news/new-study-shows-alzheimers-disease-can-be-reversed-achieve-full-neurologica...
441•thunderbong•13h ago•111 comments

Geometric Algorithms for Translucency Sorting in Minecraft [pdf]

https://douira.dev/assets/document/douira-master-thesis.pdf
40•HeliumHydride•1w ago•8 comments

Clearspace (YC W23) Is Hiring a Founding Network Engineer (VPN and Proxy)

https://www.ycombinator.com/companies/clearspace/jobs/5LtM86I-founding-network-engineer-at-clears...
1•anteloper•11h ago

Fabrice Bellard Releases MicroQuickJS

https://github.com/bellard/mquickjs/blob/main/README.md
1444•Aissen•2d ago•540 comments

Who Watches the Waymos? I do [video]

https://www.youtube.com/watch?v=oYU2hAbx_Fc
274•notgloating•1d ago•104 comments

Ruby 4.0.0

https://www.ruby-lang.org/en/news/2025/12/25/ruby-4-0-0-released/
707•FBISurveillance•1d ago•162 comments

Phoenix: A modern X server written from scratch in Zig

https://git.dec05eba.com/phoenix/about/
634•snvzz•1d ago•386 comments
Open in hackernews

Launch HN: Jazzberry (YC X25) – AI agent for finding bugs

46•MarcoDewey•7mo ago
Hey HN! We are building Jazzberry (https://jazzberry.ai), an AI bug finder that automatically tests your code when a pull request occurs to find and flag real bugs before they are merged.

Here’s a demo video: https://www.youtube.com/watch?v=L6ZTu86qK8U#t=7

We are building Jazzberry to help you find bugs in your code base. Here’s how it works:

When a PR is made, Jazzberry clones the repo into a secure sandbox. The diff from the PR is provided to the AI agent in its context window. In order to interact with the rest of the code base, the AI agent has the ability to execute bash commands within the sandbox. The output from those commands is fed back into the agent. This means that the agent can do things like read/write files, search, install packages, run interpreters, execute code, and so on. It observes the outcomes and iteratively tests to pinpoint bugs, which are then reported back in the PR as a markdown table.

Jazzberry is focused on dynamically testing your code in a sandbox to confirm the presence of real bugs. We are not a general code review tool, our only aim is to provide concrete evidence of what's broken and how.

Here are some real examples of bugs that we have found so far.

“Authentication Bypass (Critical)” - When `AUTH_ENABLED` is `False`, the `get_user` dependency in `home/api/deps.py` always returns the first superuser, bypassing authentication and potentially leading to unauthorized access. Additionally, it defaults to superuser when the authenticated auth0 user is not present in the database.

“Insecure Header Handling (High)” - The server doesn't validate header names/values, allowing injection of malicious headers, potentially leading to security issues.

“API Key Leakage (High)” - Different error messages in browser console logs revealed whether API keys were valid, allowing attackers to brute force valid credentials by distinguishing between format errors and authorization errors.

Working on this, we've realized just how much the rise of LLM-generated code is amplifying the need for better automated testing solutions. Traditional code coverage metrics and manual code review are already becoming less effective when dealing with thousands of lines of LLM-generated code. We think this is going to get more so over time—the complexity of AI-authored systems will ultimately require even more sophisticated AI tooling for effective validation.

Our backgrounds: Mateo has a PhD in reinforcement learning and formal methods with over 20 publications and 350 citations. Marco holds an MSc in software testing, specializing in LLMs for automated test generation.

We are actively building and would love your honest feedback!

Comments

bigyabai•7mo ago
> Jazzberry is focused on dynamically testing your code in a sandbox to confirm the presence of real bugs.

That seems like a waste of resources to perform a job that a static linter could do in nanoseconds. Paying to spin up a new VM for every test is going to incur a cost penalty that other competitors can skip entirely.

MarcoDewey•7mo ago
You are right that static linters are incredibly fast and efficient for catching certain classes of issues.

Our focus with the dynamic sandbox execution is aimed at finding bugs that are much harder for static analysis to detect. These are bugs like logical flaws in specific execution paths and unexpected interactions between code changes.

winwang•7mo ago
Do you guide the LLM to do this specifically? So it doesn't "waste" time on what can be taken care of by static analysis? Would be interesting if you could also integrate traditional analysis tools pre-LLM.
bananapub•7mo ago
how did and do you validate that this is of any value at all?

how many test cases do you have? how do you score the responses? how do you ensure random changes by the people who did almost all of the work (training models) doesn't wreck your product?

winwang•7mo ago
Not the OP but -- I would immediately believe that finding bugs would be a valuable problem to solve. Your questions should probably be answered on an FAQ though, since they are pretty good.
decodingchris•7mo ago
Cool demo! You mentioned using a microVM, which I think is Firecracker? And if it is, any issues with it?
mp0000•7mo ago
Thanks! We are indeed using Firecracker. No issues so far
sublinear•7mo ago
I'm kinda curious how this compares to GitLab's similar offering: https://docs.gitlab.com/user/project/merge_requests/duo_in_m...
mp0000•7mo ago
We are laser focused on bug finding, and aren't targeting general code review, like comments on code style and variable names. We also run your code as part of bug finding instead of only having an LLM inspect it
jdefr89•7mo ago
Ton of work already being done on this. I am a Vulnerability Researcher @ MIT and I know of a few efforts, just at my lab alone, being worked on. So far nearly everything I have seen seems to do nothing but report false positives. They are missing bugs a fuzzer could have found in minutes. I will be impressed when it finds high severity/exploitable bugs. I think we are a bit too far from that if its achievable though. On the flip side LLMs have been very useful reverse engineering binaries. Binary Ninja w/ Sidekick (their LLM plugin) can recover and name data structures quite well. It saves a ton of time. Also does a decent job providing high level overviews of code...
hanlonsrazor•7mo ago
Agree with you on that. There is nothing about LLMs that makes them uniquely suited for bug finding. However, they could excel re:bugs by recovering traces as you say, and taking it one step further, even recommending fixes.
winwang•7mo ago
One possibility is crafting (somewhat-)minimal reproductions. There's some work in the FP community to do this via traditional techniques, but they seem quite limited.
MarcoDewey•7mo ago
Correct, what is unique about LLMs is their ability to match an existing tool or practice to a unique problem.
MarcoDewey•7mo ago
I definitely agree that there's a lot of research happening in this space, and the false positive issue is a significant hurdle. From my own research and experimentation, I have also seen how challenging it is to get LLM-powered tools to consistently find real.

Our approach with Jazzberry is specifically focused on the dynamic execution aspect within the PR context. I am seeing that by actually running the code with the specific changes, we can get a clearer signal about functional errors. We're very aware of the need to demonstrate our ability to find those high-severity/exploitable bugs you mentioned, and that's a key metric for us as we continue to develop it.

Given your background, I'd be really interested to hear if you have any thoughts on what approaches you think might be most promising for moving beyond the false positive problem in AI-driven bug finding. Any insights from your work at MIT would be incredibly valuable.

mp0000•7mo ago
We largely agree, we don't think pure LLM-based approaches are sufficient. Having an LLM automatically orchestrate tools, like a software fuzzer, is something we've been thinking about for a while and we view incorporating code execution as the first step.

We think that LLMs are able to capture semantic bugs that traditional software testing cannot find in a hands-off way, and ideas from both worlds will be needed for a truly holistic bug finder.

csnate•7mo ago
Solving the false positive problem is like solving the halting problem. I don’t think we get to a world where static analysis tools don’t have them, AI or otherwise.

That said, I have found LLMs can find bugs in binaries. It’s not all false positives, as far as I can tell. I have a side project I’ve been working on that does just this (shameless plug): PwnScan.com. It’s currently free and focused on binaries.

The bad news is that you quickly get into a situation where you have too many false positives where it’s sometimes not feasible to sort through them all.

ninetyninenine•7mo ago
It's definitely not like solving the halting problem. A solution 100% exists. You are it. If human intelligence can be realized in physical reality by an actual human brain, then it is provably realizable.

Few things in science exist as a north star in such abundance. We KNOW it can be built. Other futuristic things like interstellar travel... we don't actually know.

ToValueFunfetti•7mo ago
I think it maps perfectly onto the halting problem: just say one of the requirements of your program is halting. Humans can decide whether a program halts in a lot of cases, including more-or-less all of the programs we're likely to encounter. But for the overwhelming majority of possible programs, we can't figure it out.

A useful bug detector doesn't need to overcome this because it would be detecting bugs in the kind of code we write, but there is no bug detector which gives the correct answer for all inputs.

ninetyninenine•7mo ago
I don’t think you realize how universal the halting problem is in the universe.

Like the law governs everything that exists in the universe so it governs humans as well.

If a human can know that a program halts it also means the program is provably haltable. If a human doesn’t know whether a program will halt it likely means that the program is not provably haltable.

The halting problem refers to a general algorithm that can prove any program will halt.

jollofricepeas•7mo ago
The funny thing is…

You’re stating the problem with that whole sector.

I really wish product owners, researchers and tool creators had more actual real world experience on the remediation side. I think that’s the reason we have so many crappy tools.

- We need a better way of addressing business logic issues and sensitive data leakage which starts at the data model and flows from there.

- Within large organizations we need better risk data about vulns to aid with prioritization and remediation which is always the larger problem (sifting through noise)

- We need automated threat modeling tools that reduce a teams need to start from zero

Fundamentally a tool is a waste of time if it can’t tell you there’s “x% possibility of downtime or sensitive data leakage.”

Addressing the risk equation (r=il) where the impact and likelihood variables are baked into every tool and based in real world data is where we should be.

Until then, vulnerability scanning and management will continue to suck.

MarcoDewey•7mo ago
I also wish that I had more real world experience. It would help me a ton if I had 25 years of software testing experience.

It sounds like you do have experience, and I would love to learn from you. It would be awesome if you could help us build a tool that is truly useful for you and your work.

ArnavAgrawal03•7mo ago
I've used your product and particularly like that you show bugs in a table instead of littering my entire PR.

Does Jazzberry run on the entire codebase, or does it look at the specific PR? Would also like some more details about the tool - it seems much faster than others I've tried - are you using some kind of smaller fine-tuned model?

mp0000•7mo ago
Thank you! This is definitely a design choice we believe strongly in.

Jazzberry starts by viewing the patch associated with the PR, but can choose to view any file in the repo as it searches for bugs. All the models in use are off-the-shelf for now, but we have been thinking about training a model from the beginning, stay tuned!

RainyDayTmrw•7mo ago
I wish I could click into the "real bugs" and see full example output.
MarcoDewey•7mo ago
The bugs shown in the "real bugs" section are real output from the tool. Are you referring to looking at the full table of bugs that we return? Sometimes we only find one bug in the PR, sometimes our clients don't want us to share other bugs that could expose their work.
bluelightning2k•7mo ago
Interesting choice to have your only demo video be testing a CLI. Unless that's literally the use-case it's for?
MarcoDewey•7mo ago
That is one of the obvious use cases. There are many others, you are welcome to install the bot and play around with it. I would love to hear your feedback.
AIorNot•7mo ago
Very cool - what’s the scope of its abilities:

Can this be used for UX bugs eg nav bugs or finding runtime errors in a react site? Ie will it check the frontend

MarcoDewey•7mo ago
Yes, we have caught some navigation bugs on the front end.

There are some other really cool tools built explicitly for this, like QualGent and Operative.sh

rylanu•7mo ago
Having an agent explore a sandbox environment, install dependencies, execute tests, etc. This sounds slow and resource intensive. Does Jazzberry scale for large teams with monorepos and dozens of PRs daily?
MarcoDewey•7mo ago
We are working daily on making Jazzberry better for larger teams and repositories. Our goal is to find deep, difficult bugs in large repos
sorokod•7mo ago
What is your experience on running your product on it's own code?
lacker•7mo ago
I tried it out but I don't have any pending pull requests on my personal repositories, and I don't want to give a new tool write access to a professional repository where other people are working before trying it out a bit. It would be great if it would scan a repository and tell me if it found any bugs, so that I could see if it worked before messing with real pull requests.
rylanu•7mo ago
Microsoft is having so many issues it would be really helpful for tools like this to be used now.