frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

I used o3 to find a remote zeroday in the Linux SMB implementation

https://sean.heelan.io/2025/05/22/how-i-used-o3-to-find-cve-2025-37899-a-remote-zeroday-vulnerability-in-the-linux-kernels-smb-implementation/
182•zielmicha•6h ago

Comments

zison•6h ago
Very interesting. Is the bug it found exploitable in practice? Could this have been found by syzkaller?
mdaniel•4h ago
I case anyone else didn't recognize that word: https://github.com/google/syzkaller
zielmicha•6h ago
(To be clear, I'm not the author of the post, the title just starts with "How I")
mdaniel•4h ago
Noteable:

> o3 finds the kerberos authentication vulnerability in 8 of the 100 runs

And I'd guess this only became a blog post because the author already knew about the vuln and was just curious to see if the intern could spot it too, given a curated subset of the codebase

moyix•2h ago
He did do exactly what you say – except right after that, while reviewing the outputs, he found that it had also discovered a different 0day.
PunchyHamster•1h ago
Now the question is whether spending same time to analyze that bit of code instead of throwing automated intern at it would be time spent better
lyu07282•1h ago
The time they didn't spend reading the 13k LOCs themselves would've been time spent better.

What?

Retr0id•2h ago
The article cites a signal to noise ratio of ~1:50. The author is clearly deeply familiar with this codebase and is thus well-positioned to triage the signal from the noise. Automating this part will be where the real wins are, so I'll be watching this closely.
tough•2h ago
I was thinking about this the other day, wouldn't it be feasible to make fine-tune or something like that into every git change, mailist, etc, the linux kernel has ever hard?

Wouldn't such an LLM be the closer -synth- version of a person who has worked on a codebase for years, learnt all its quirks etc.

There's so much you can fit on a high context, some codebases are already 200k Tokens just for the code as is, so idk

sodality2•2h ago
I'd be willing to bet the sum of all code submitted via patches, ideas discussed via lists, etc doesn't come close to the true amount of knowledge collected by the average kernel developer's tinkering, experimenting, etc that never leaves their computer. I also wonder if that would lead to overfitting: the same bugs being perpetuated because they were in the training data.
andix•2h ago
1:50 is a great detection ratio for finding a needle in a haystack.
quentinp•1h ago
Exactly. Many AI users can’t triage effectively, as a result open source projects get a lot of spam now: https://arstechnica.com/gadgets/2025/05/open-source-project-...
ianbutler•1h ago
We’ve been working on a system that increases signal to noise dramatically for finding bugs, we’ve at the same time been thoroughly benchmarking the entire popular software agents space for this

We’ve found a wide range of results and we have a conference talk coming up soon where we’ll be releasing everything publicly so stay tuned for that itll be pretty illuminating on the state of the space

Edit: confusing wording

sebmellen•1h ago
Interesting. This is for Bismuth? I saw your pilot program link — what does that involve?
ianbutler•11m ago
Yup! So we have multiple businesses working with us and for pilots its deploying the tool, providing feedback (we're connected over slack with all our partners for a direct line to us), and making sure the uses fit expectations for your business and working towards long term partnership.

We have several deployments in other peoples clouds right now as well as usage of our own cloud version, so we're flexible here.

manmal•1h ago
If the LLM wrote a harness and proof of concept tests for its leads, then it might increase S/N dramatically. It’s just quite expensive to do all that right now.
Hilift•2h ago
Does the vulnerability exist in other implementations of SMB?
p_ing•22m ago
Implementations of SMB (Windows, Samba, macOS, ksmbd) are going to be different (macOS has a terrible implementation, even though AFP is being deprecated). At this level, it's doubtful that the code is shared among all implementations.
logifail•2h ago
My understanding is that ksmbd is a kernel-space SMB server "developed as a lightweight, high-performance alternative" to the traditional (user-space) Samba server...

Q1: Who is using ksmbd in production?

Q2: Why?

pixl97•2h ago
I would assume for the reason of being lightweight and high performance?
foobar10000•2h ago
Smb over 25gbit networks - user space samba is much worse there.
Henchman21•1h ago
This is interesting to me! I regularly deploy 25G network connections, but I don’t think we’d run SMB over that. I am super curious the industry and use case if you’re willing to share!
hackernudes•1h ago
"SMB Direct" is RDMA based and ksmbd supports it. Samba does not. Disclaimer: I have not used it but was looking it up just yesterday.
donnachangstein•1h ago
1. People that were using the in-kernel SMB server in Solaris or Windows.

2. Samba performance sucks (by comparison) which is why people still regularly deploy Windows for file sharing in 2025.

Anybody know if this supports native Windows-style ACLs for file permissions? That is the last remaining reason to still run Solaris but I think it relies on ZFS to do so.

Samba's reliance on Unix UID/GID and the syncing as part of its security model is still stuck in the 1970s unfortunately.

The caveat is the in-kernel SMB server has been the source of at least one holy-shit-this-is-bad zero-day remote root hole in Windows (not sure about Solaris) so there are tradeoffs.

raverbashing•1h ago
> Samba's reliance on Unix UID/GID and the syncing as part of its security model is still stuck in the 1970s unfortunately.

Sigh. This is why we can't have nice things

Like yeah having smb in kernel is faster but honestly it's not fundamentally faster. But it seems the will to make samba better isn't there

AshamedCaptain•35m ago
Licensing. Samba is GPLv3, Linux is only GPLv2.
noname120•7m ago
The same reason people use kmod-trelay instead of relayd I guess
iandanforth•2h ago
The most interesting and significant bit of this article for me was that the author ran this search for vulnerabilities 100 times for each of the models. That's significantly more computation than I've historically been willing to expend on most of the problems that I try with large language models, but maybe I should let the models go brrrrr!
roncesvalles•1h ago
A lot of money is all you need~
bbarnett•38m ago
A lot of burned coal, is what.

The "don't blame the victim" trope is valid in many contexts. This one application might be "hackers are attacking vital infrastructure, so we need to fund vulnerabilities first". And hackers use AI now, likely hacked into and for free, to discover vulnerabilities. So we must use AI!

Therefore, the hackers are contributing to global warming. We, dear reader, are innocent.

Balooga•16m ago
Between $3k and $30k to solve a single ARC-AGI problem [1]. Not sure if "100 runs" makes this comparable.

[1] https://techcrunch.com/2025/04/02/openais-o3-model-might-be-...

sdoering•11m ago
So basically running a microwave for about 800 seconds, or a bit more than 13 minutes per model?

Oh my god - the world is gonna end. Too bad, we panicked because of exaggerated energy consumption numbers for using an LLM when doing individual work.

Yes - when a lot of people do a lot of prompting, these 0ne tenth of a second to 8 seconds of running the microwave per prompt adds up. But I strongly suggest, that we could all drop our energy consumption significantly using other means, instead of blaming the blog post's author about his energy consumption.

The "lot of burned coal" is probably not that much in this blog post's case given that 1 kWh is about 0.12 kg coal equivalent (and yes, I know that we need to burn more than that for 1kWh. Still not that much, compared to quite a few other human activities.

If you want to read up on it, James O'Donnell and Casey Crownhart try to pull together a detailed account of AI energy usage for MIT Technology Review.[1] I found that quite enlightening.

[1]: https://www.technologyreview.com/2025/05/20/1116327/ai-energ...

mezyt•1h ago
Meanwhile, as a maintainer, I've been reviewing more than a dozen false positives slop CVEs in my library and not a single one found an actual issue. This article's is probably going to make my situation worse.
SamuelAdams•59m ago
Maybe, but the author is an experienced vulnerability analyst. Obviously if you get a lot of people who have no experience with this you may get a lot of sloppy, false reports.

But this poster actually understands the AI output and is able to find real issues (in this case, use-after-free). From the article:

> Before I get into the technical details, the main takeaway from this post is this: with o3 LLMs have made a leap forward in their ability to reason about code, and if you work in vulnerability research you should start paying close attention. If you’re an expert-level vulnerability researcher or exploit developer the machines aren’t about to replace you. In fact, it is quite the opposite: they are now at a stage where they can make you significantly more efficient and effective.

jobswithgptcom•1h ago
Wow, interesting. I been hacking a tool called https://diffwithgpt.com with a similar angle but indexing git changelogs with qwen to have it raise risks for backward compat issues, risks including security when upgrading k8s etc.
empath75•1h ago
Given the value of finding zero days, pretty much every intelligence agency in the world is going to be pouring money into this if it can reliably find them with just a few hundred api calls. Especially if you can fine tune a model with lots of examples, which I don't think open ai, etc are going to do with any public api.
akomtu•1h ago
This made me think that the near future will be LLMs trained specifically on Linux or another large project. The source code is a small part of the dataset fed to LLMs. The more interesting is runtime data flow, similar to what we observe in a debugger. Looking at the codebase alone is like trying to understand a waterfall by looking at equations that describe the water flow.
KTibow•1h ago
> With o3 you get something that feels like a human-written bug report, condensed to just present the findings, whereas with Sonnet 3.7 you get something like a stream of thought, or a work log.

This is likely because the author didn't give Claude a scratchpad or space to think, essentially forcing it to mix its thoughts with its report. I'd be interested to see if using the official thinking mechanism gives it enough space to get differing results.

gizmodo59•9m ago
Having tried both I’d say o3 is in a league of it’s own compared to 3.7 or even Gemini 2.5 pro. The benchmarks may show not a lot of gain but that matters a lot when the task is very complex. What’s surprising is that they announced it last November and only now it’s released a month back now? (I’m guessing lots of safety took time but no idea). Can’t wait for o4!
nxobject•1h ago
A small thing, but I found the author's project-organization practices useful – creating individual .prompt files for system prompt, background information, and auxiliary instructions [1], and then running it through `llm`.

It reveals how good LLM use, like any other engineering tool, requires good engineering thinking – methodical, and oriented around thoughtful specifications that balance design constraints – for best results.

[1] https://github.com/SeanHeelan/o3_finds_cve-2025-37899

kweingar•31m ago
How do we benchmark these different methodologies?

It all seems like vibes-based incantations. "You are an expert at finding vulnerabilities." "Please report only real vulnerabilities, not any false positives." Organizing things with made-up HTML tags because the models seem to like that for some reason. Where does engineering come into it?

nindalf•20m ago
The author is up front about the limitations of their prompt. They say

> In fact my entire system prompt is speculative in that I haven’t ran a sufficient number of evaluations to determine if it helps or hinders, so consider it equivalent to me saying a prayer, rather than anything resembling science or engineering. Once I have ran those evaluations I’ll let you know.

dehrmann•59m ago
Are there better tools for finding this? It feels like the sort of thing static analysis should reliably find, but it's in the Linux kernel, so you'd think either coding standards or tooling around these sorts of C bugs would be mature.
grg0•12m ago
Not the expert in the area, but "classic static analysis" (for lack of a better term) and concurrency bugs doesn't really check. There are specific modeling tools for concurrency, and they are an entirely different beast than static analysis that requires notation and language support to describe what threads access what data when. Concurrency bugs in static analysis probably requires a level of context and understanding that an LLM can easily churn through.
yellow_lead•11m ago
Some static analysis tools can detect use after free or memory leaks. But since this one requires reasoning about multiple threads, I think it would've been unlikely to be found by static analysis.
firesteelrain•42m ago
I really hope this is legit and not what keeps happening to curl

[1] https://daniel.haxx.se/blog/2024/01/02/the-i-in-llm-stands-f...

ape4•28m ago
Seems we need something like kernel modules but with memory protection

A Deal with the Digital Devil

https://chroniclesmagazine.org/view/a-deal-with-the-digital-devil/
1•Anon84•1m ago•0 comments

Harvard Derangement Syndrome

https://www.nytimes.com/2025/05/23/opinion/harvard-university-trump-administration.html
2•Anon84•6m ago•1 comments

The Verse Calculus: A Core Calculus for Functional Logic Programming [pdf]

https://simon.peytonjones.org/assets/pdfs/verse-March23.pdf
2•droideqa•7m ago•0 comments

Space Selfie

https://space.crunchlabs.com:443/
1•caiobegotti•8m ago•0 comments

Show HN: 1min Workouts for People Who Sit All Day

https://shortreps.com
1•melvinzammit•8m ago•0 comments

Ask HN: How to request specific tools from an MCP server?

1•Arindam1729•9m ago•0 comments

Show HN: Organizing and tagging large music collections in Go

https://github.com/sentriz/wrtag
2•commotionfever•13m ago•1 comments

KYC Is the Crime – Ludlow Institute

https://www.ludlowinstitute.org/articles/kyc-is-the-crime
3•janandonly•17m ago•0 comments

Home microgrids: a blueprint for the future of sustainable household energy?

https://www.ft.com/content/e8cb4f0e-7c66-4b48-aa27-9ddc908bc6d6
1•frereubu•18m ago•1 comments

Tiny and Mighty Reward Models: J1

https://github.com/haizelabs/j1-micro
1•nkko•19m ago•0 comments

Ask HN: I would like to help founders

1•modelcroissant•20m ago•0 comments

Show HN: Quell – AI QA Agent Working Across Linear, Vercel, Jira, Netlify, Figma

https://www.quellit.ai/
2•buildinext•24m ago•0 comments

What Are MCP Servers and Why People Are Crazy About It?

https://itsfoss.com/mcp-servers/
1•zhengiszen•25m ago•0 comments

The Vibe of Code

https://thevibeofcode.com/
1•birriel•25m ago•0 comments

The Logistics of Road War in the Wasteland

https://acoup.blog/2025/05/23/collections-the-logistics-of-road-war-in-the-wasteland/
2•ecliptik•27m ago•0 comments

Keep if clauses side-effect free

https://www.teamten.com/lawrence/programming/keep-if-clauses-side-effect-free.html
1•a_w•35m ago•1 comments

Rick Rubin on Art, Life, and Vibe Coding [video]

https://www.youtube.com/watch?v=6BDsFUvPqI0
1•hank808•36m ago•0 comments

My Cute Homelab

https://jan.wildeboer.net/2025/05/Cute-Homelab/
1•zdw•43m ago•0 comments

Making of Clavis Cælestis: A Synopsis of the Universe

https://www.c82.net/blog/?id=99
1•tobr•44m ago•0 comments

Worlds first petahertz transistor at ambient conditions

https://news.arizona.edu/news/u-researchers-developing-worlds-first-petahertz-speed-phototransistor-ambient-conditions
1•ChuckMcM•44m ago•2 comments

Reinvent the Wheel

https://endler.dev/2025/reinvent-the-wheel/
43•zdw•45m ago•23 comments

The Great Irony of Stablecoin

https://text-incubation.com/The+great+irony+of+stablecoin
1•krrishd•48m ago•0 comments

Show HN: Validating DefendChurn – Early warning system for SaaS customer churn

https://www.defendchurn.space/
1•FlorinDobinciuc•52m ago•0 comments

Inboxes Are Underrated

https://borretti.me/article/inboxes-are-underrated
2•ibobev•52m ago•1 comments

Why wealth rarely survives grandchildren

https://www.marketwatch.com/story/why-wealth-rarely-survives-grandchildren-and-how-your-family-can-beat-the-odds-0ec8cb9d
1•fjd•54m ago•1 comments

The ferocious wind wars being fought in the middle of the North Sea

https://www.telegraph.co.uk/business/2025/05/24/wind-wars-the-fight-over-the-north-sea-breeze/
1•asimpleusecase•56m ago•0 comments

Tachy0n: The Last 0day Jailbreak

https://blog.siguza.net/tachy0n/
40•todsacerdoti•59m ago•1 comments

Guardian journalists in revolt over 'miserable' website redesign

https://www.telegraph.co.uk/business/2025/05/24/guardian-journalists-revolt-miserable-website-redesign/
2•asimpleusecase•1h ago•1 comments

Built a Minimalist Debt Payoff Tracker for Builders

https://apps.apple.com/us/app/toffee-debt-tracker/id6743946913
1•toffeefinance•1h ago•1 comments

Tim Cook Called Texas Governor to Stop Online Child-Safety Legislation

https://www.wsj.com/tech/tim-cook-called-texas-governor-to-stop-online-child-safety-legislation-22858ad4
5•ViktorRay•1h ago•1 comments