Kernel code removals driven by LLM-created security reports

47•edward•2h ago

Comments

staticassertion•1h ago

They can't maintain the code so they are no longer going to maintain the code.

sigmoid10•1h ago

Seems like this should have happened anyways and LLMs just finally forced them to admit it.

bastawhiz•30m ago

You're being downvoted but I think you're right in a lot of ways. If you read through the patches for some of the removals, the reasons come down to:

- Nobody is familiar with the code

- Almost all of the recent fixes are from static analysis

- Nobody is even sure if anyone uses the code

This feels a lot like CPython culling stdlib modules and making them pypi packages. The people who rely on those things have a little bit of extra work if they want a recent kernel version, and everyone else benefits (directly or indirectly) by way of there being less stuff that needs attention.

fluidcruft•1h ago

It's an interesting form of tree shaking.

The overlap of bugs being found, nobody caring enough to bother read the reports or fix the code, and nobody caring that the modules are pushed out of main seems good.

traceroute66•1h ago

> They can't maintain the code so they are no longer going to maintain the code.

Yes, I don't see the point of maintaining technical debt just for the sake of it.

The security environment in 2026 is such that legacy unmaintained code is a very real security risk for obscure zero-days to exploit to gain a foot in the door.

Reading through the list I don't see it being an issue for the overwhelming majority of Linux users.

Who, for example, still uses ISDN in 2026 ? Most telcos have stopped all new sales and existing ISDN circuits will be forcefully disconnected within 3–5 years as the telcos complete their FTTP build-outs and the copper network is subsequently decomissioned.

goalieca•31m ago

Maybe attackers would focus on these unused bits for very niche products, but generally no one would waste their time.

In general, drivers make up the largest attack surface in the kernel and many of them are just along for the ride rather than being actively maintained and reviewed by researchers.

rasz•57m ago

Most if not all of the listed stuff could be converted to used mode code.

dbdr•49m ago

*user-mode code.

ferguess_k•51m ago

Are we already in the time, or close to the time, that well-trained LLMs are more efficient in finding security holes than all but the best developers out there, even for OS kernel code? Can someone educate me on this?

olmo23•47m ago

We are there. This is pretty much the reason why Mythos isn't being released publically.

pocksuppet•8m ago

The reason Mythos isn't being released publicly is to drive up Anthropic's valuation by making big promises.

traceroute66•45m ago

> well-trained LLMs are more efficient in finding security holes than all but the best developers out there, even for OS kernel code?

No.

Like everything else an LLM touches, it is prone to slop and hallucinations.

You still need someone who knows what they are doing to review (and preferably manually validate) the findings.

What all this recent hype carefully glosses over is the volume of false-positives. I guarantee you it is > 0 and most likely a fairly large number.

And like most things LLM, the bigger the codebase the more likely the false-positives due to self-imposed context window constraints.

Its all very well these blog posts saying "LLM found this serious bug in Firefox", well yeah but that's only because the security analyst filtered out all the junk (and knew what to ask the LLM in the prompt in the first place).

stratos123•21m ago

A 0% false-positive rate is not necessary for LLM-powered security review to be a big deal. It was worthless a few months ago, when the models were terrible at actually finding vulnerabilities and so basically all the reports were confabulated, with a false positive rate of >95%. Nowadays things are much better - see e.g. [1] by a kernel maintainer.

Another way to see this is that you mentioned "LLM found this serious bug in Firefox", but the actual number in that Mozilla report [2] was 14 high-severity bugs, and 90 minor ones. However you look at it, it's an impressive result for a security audit, and I dount that the Antropic team had to manually filter out hundreds-to-thousands of false-positives to produce it.

They did have to manually write minimal exploits for each bug, because Opus was bad at it[3]. This is a problem that Mythos doesn't have. With access to Mythos, to repeat the same audit, you'd likely just need to make the model itself write all the exploits, which incidentally would also filter out a lot of the false positives. I think the hype is mostly justified.

[1] https://lwn.net/Articles/1065620/

[2] https://blog.mozilla.org/en/firefox/hardening-firefox-anthro...

[3] https://www.anthropic.com/news/mozilla-firefox-security

stratos123•40m ago

In terms of quantity, definitely yes (a single person managing a swarm of Opusi can already find much more real bugs than a security researcher, hence the rise in reports).

In terms of quality ("are there bugs that professional humans can't see at any budget but LLMs can?") - it's not very clear, because Opus is still worse than a human specialist, but Mythos might be comparable. We'll just have to wait and see what results Project Glasswing gets.

Either way, cybersecurity is going to get real weird real soon, because even slightly-dumb models can have a large effect if they are cheap and fast enough.

EDIT: Mozilla thinks "no" to the second question, by the way: "Encouragingly, we also haven’t seen any bugs that couldn’t have been found by an elite human researcher.", when talking about the 271 vulnerabilities recently found by Mythos. https://blog.mozilla.org/en/firefox/ai-security-zero-day-vul...

DanielHB•32m ago

There is also a huge surface area of security problems that can't happen in practice due to how other parts of the code work. A classic example is unsanitized input being used somewhere where untrusted users can't inject any input.

Being flooded with these kind of reports can make the actual real problems harder to see.

chuckadams•27m ago

> Opusi

The plural of "Opus" is "Opera". Might be a tad confusing tho :)

jcalvinowens•23m ago

My experience with these tools is that they generate absolutely enormous amounts of insidiously wrong false positives, and it actually takes a decent amount of skill to work through the 99% which is garbage with any velocity.

Of course some people don't do that, and send all the reports anyway... and then scream from the hilltops about how incredible LLMs are when by sheer luck one happens to be right. Not only is that blatant p-hacking, it's incredibly antisocial.

It's disingenuous marketing speak to say LLMs are "finding" any security holes at all: they find a thousand hypotheticals of which one or two might be real. A broken clock is right twice a day.

yk•7m ago

My theory is, that a lot of security bugs are low hanging fruit for LLMs in the sense that it is a bit tedious but not that hard pattern matching. (Let's see the free occurs in foo(), so if I trigger bar() after foo() then I have a use after free, that should be possible if I trigger an exception in baz::init().)

cozzyd•33m ago

Seems like there should be some "level of maintenance" metric for modules and distros can pick which they include by default and which are packaged separately based on what they care about. Arch users will build the world but an EL user who needs an unmaintained module would have to explicitly install kmod-isdn or even build it themselves

mmsc•21m ago

Unmaintained code is a security issue in of itself, so this is of course a net benefit.

Windows 9x Subsystem for Linux

GitHub CLI now collects pseudoanonymous telemetry

3.4M Solar Panels

The eighth-generation TPU: An architecture deep dive

Our eighth generation TPUs: two chips for the agentic era

Kernel code removals driven by LLM-created security reports

Treetops glowing during storms captured on film for first time

How the heck does GPS work?

Making RAM at Home [video]

ChatGPT Images 2.0

Columnar Storage Is Normalization

Another Day Has Come

Nobody Got Fired for Uber's $8M Ledger Mistake?

Why Musicians Are Manufacturing Sold-Out Shows

XOR'ing a register with itself is the idiom for zeroing it out. Why not sub?

All your agents are going async

Monitor your Pi / OMP sessions

Prefill-as-a-Service:KVCache of Next-Generation Models Could Go Cross-Datacenter

MuJoCo – Advanced Physics Simulation

Contact Lens Uses Microfluidics to Monitor and Treat Glaucoma

Garbage Collection Without Unsafe Code

Windows Server 2025 Runs Better on ARM

Drunk post: Things I've learned as a senior engineer (2021)

CATL's new LFP battery can charge from 10 to 98% in less than 7 minutes

The Vercel breach: OAuth attack exposes risk in platform environment variables

Acetaminophen vs. ibuprofen

SpaceX says it has agreement to acquire Cursor for $60B

Meta to start capturing employee mouse movements, keystrokes for AI training

Britannica11.org – a structured edition of the 1911 Encyclopædia Britannica

Diverse organic molecules on Mars revealed by the first SAM TMAH experiment

Kernel code removals driven by LLM-created security reports

Comments

Windows 9x Subsystem for Linux

GitHub CLI now collects pseudoanonymous telemetry

3.4M Solar Panels

The eighth-generation TPU: An architecture deep dive

Our eighth generation TPUs: two chips for the agentic era

Kernel code removals driven by LLM-created security reports

Treetops glowing during storms captured on film for first time

How the heck does GPS work?

Making RAM at Home [video]

ChatGPT Images 2.0

Columnar Storage Is Normalization

Another Day Has Come

Nobody Got Fired for Uber's $8M Ledger Mistake?

Why Musicians Are Manufacturing Sold-Out Shows

XOR'ing a register with itself is the idiom for zeroing it out. Why not sub?

All your agents are going async

Monitor your Pi / OMP sessions

Prefill-as-a-Service:KVCache of Next-Generation Models Could Go Cross-Datacenter

MuJoCo – Advanced Physics Simulation

Contact Lens Uses Microfluidics to Monitor and Treat Glaucoma

Garbage Collection Without Unsafe Code

Windows Server 2025 Runs Better on ARM

Drunk post: Things I've learned as a senior engineer (2021)

CATL's new LFP battery can charge from 10 to 98% in less than 7 minutes

The Vercel breach: OAuth attack exposes risk in platform environment variables

Acetaminophen vs. ibuprofen

SpaceX says it has agreement to acquire Cursor for $60B

Meta to start capturing employee mouse movements, keystrokes for AI training

Britannica11.org – a structured edition of the 1911 Encyclopædia Britannica

Diverse organic molecules on Mars revealed by the first SAM TMAH experiment