Mozilla says 271 vulnerabilities found by Mythos and "almost no false positives"

https://arstechnica.com/information-technology/2026/05/mozilla-says-271-vulnerabilities-found-by-mythos-have-almost-no-false-positives/

72•epistasis•3h ago

Comments

lschueller•3h ago

Let's see, how this will improve the daily soc work. I still don't see, what's the big difference between Mythos and Opus, security wise. I'm confident, that this kind of vul detection is a long-term improvement. But does specifically Mythos makes such a big difference to "normal" models? I would love to see, what's the actual difference.

JoshTriplett•3h ago

Among other things, Mythos seems better at "let me find, weaponize, and stack vulnerabilities until I get end-to-end from untrusted content to root", rather than just finding one thing in a specific identified area.

mccr8•1h ago

Quantifying the abilities of an LLM is a hard research problem, so I'm not sure if I can describe it in any great way, but Mythos did seem to be fairly clever about putting together things from different domains to find problems.

For instance, in one of the included bugs (2022034) it figured out that a floating point value being sent over IPC could be modified by an attacker in such a way that it would be interpreted by the JS engine as an arbitrary pointer, due to the way the JS engine uses a clever representation of values called NaN-boxing. This is not beyond the realm of a human researcher to find, but it did nicely combine different domains of security.

As the person responsible for accidentally introducing that security problem (and then fixing it after the Mythos report), while I am aware of NaN-boxing (despite not being a JS engine expert), I was focused more on the other more complex parts of this IPC deserialization code so I hadn't really thought about the potential problems in this context. It is just a floating point value, what could go wrong?

lschueller•57m ago

Okay, so far it makes sense to me. But is the deal with JS and floating point values, which isn't soemthing super special super rare stuff, only detected and identfied by Mythos while Opus wouldn't get to this point?

IainIreland•21m ago

There doesn't have to be a huge qualitative discontinuity between Opus and Mythos. It's just that Mythos has reached a threshold where it's finally smart enough that putting it in a loop and asking it to find bugs is suddenly really effective. Especially at the beginning, Mozilla wasn't doing anything particularly clever with prompts. Mythos is just smart enough that the hit rate on obvious prompts is high enough to matter. (Maybe you can get similar performance out of Opus 4.6 with really smart prompts, but AFAICT nobody had managed it until Mythos.)

input_sh•3h ago

Original source: https://news.ycombinator.com/item?id=48051079

It's better because it actually lists a sample of Bugzilla reports that were made public. This topic was discussed previously (36 comments two weeks ago: https://news.ycombinator.com/item?id=47885042), but the part about bug reports being made public is brand new.

ChrisArchitect•2h ago

[dupe] Discussion on source: https://news.ycombinator.com/item?id=48051079

MetaverseClub•2h ago

I'm curious about how did Mozilla do bug finding before Mythos? Did they use any non-AI bug finding tools?

mccr8•2h ago

The usual sorts of fuzzing and static analyses, using AddressSanitizer and ThreadSanitizer. Also, with a bug bounty program to try to encourage external researchers to report issues. (I work on Firefox security; also I fixed 2 of the bugs linked in the blog post.)

canucker2016•1h ago

Coverity (similar to lint) scans various open source software products for vulnerabilities.

see https://www.blackduck.com/static-analysis-tools-sast/coverit...

and for Firefox-related alleged defects, see https://scan.coverity.com/projects/firefox

You have to create an account to view the actual reported defects.

There are just over 5000 reported defects still outstanding. I don't know how many overlap with the reported 271 Mythos-reported defects.

rockdoe•54m ago

How many of those are false positives though? Probably just over 5000?

You get bug bounties if you report the kind of bugs Mythos identified. There's a reason no-one collected bounties from the "5000 defects" Coverity identified.

The Mythos reports have several examples of chaining a whole bunch of logic in different parts of the program together to exploit something very subtle. The Coverity reports aren't anything like that. These tools aren't remotely in the same league or even universe.

IainIreland•19m ago

Yeah, fuzzing, sanitizers, and bug bounties were our main pre-AI tools for finding bugs.

jerrythegerbil•2h ago

Again, and this is important:

A bug is a bug. A “potential vulnerability” is a bug. A vulnerability is verifiable as having security implications with a proof of concept or other substantial evidence.

Words matter. Bugs matter. It’s important to fix large amounts of bugs, just as it always has been, and has been done. Let that be impressive on its own, because it IS impressive.

Mythos didn’t write 271 PoC for vulnerabilities and demonstrate code path reachability with security implications. Mythos found 271 valid bugs. Let that be enough.

epistasis•1h ago

I was a bit confused by your definitions, but here's how Mozilla broke out [1] the 271, um, things:

> As additional context, we apply security severity ratings from critical to low to indicate the urgency of a bug:

> * sec-critical and sec-high are assigned to vulnerabilities that can be triggered with normal user behavior, like browsing to a web page. We make no technical difference between these, but sec-critical bugs are reserved for issues that are publicly disclosed or known to be exploited in the wild.

> * sec-moderate is assigned to vulnerabilities that would otherwise be rated sec-high but require unusual and complex steps from the victim.

> * sec-low is assigned to bugs that are annoying but far from causing user harm (e.g, a safe crash).

> Of the 271 bugs we announced for Firefox 150: 180 were sec-high, 80 were sec-moderate, and 11 were sec-low.

Mozilla uses the term "vulnerability" for even sec-high, even though they say right below that it doesn't mean the same thing as a practical exploit. And on their definitional page, they classify even sec-low as "vulnerabilities" [2].

Words are tools, that get their utility from collective meaning. I'd be interested where you recieved your semantics from and if they match up or disagree with Mozilla.

[1] https://hacks.mozilla.org/2026/05/behind-the-scenes-hardenin...

[2] https://wiki.mozilla.org/Security_Severity_Ratings/Client

Gregaros•52m ago

> Mozilla uses the term "vulnerability" for even sec-high, even though they say right below that it doesn't mean the same thing as a practical exploit.

That’s not evident in what you pastedat all.

What you pasted says

> sec-critical and sec-high are assigned to vulnerabilities that can be triggered with normal user behavior […] We make no technical difference between these […] sec-critical bugs are reserved for issues that are publicly disclosed or known to be exploited in the wild.

> sec-low is assigned to bugs that are annoying but far from causing user harm (e.g, a safe crash).

From this one infers that the "180 were sec-high" bugs found are actually exploitsble but known to have been found in the wild, and are NOT mere annoying bugs.

The difference between 180 and 270 does nothing to deflate the signicance, or lack there of, of the implication re: Mythos.

epistasis•37m ago

Yes, it is not in what I pasted, as I said, "even though they say right below". If you don't believe me then click on either of the links.

throw0101c•50m ago

Presumably there are (implicit?) "sec-none" things, like [a] from the recently released 150.0.2 [b] which makes absolutely zero mention about "Security Impact" or "Severity" in the bug report, unlike [c], which is listed in the Mozilla weblog post [2].

Security things are mentioned in the Release Notes [b] pointing to a completely different document [d].

Perhaps sometimes a bug is 'just' a bug, and not a vulnerability.

[a] https://bugzilla.mozilla.org/show_bug.cgi?id=2034980 ; "Can't highlight image scans in Firefox 150+"

[b] https://www.firefox.com/en-CA/firefox/150.0.2/releasenotes/

[c] https://bugzilla.mozilla.org/show_bug.cgi?id=2024918

[d] https://www.mozilla.org/en-US/security/advisories/mfsa2026-4...

IainIreland•34m ago

I work at Mozilla; I fixed a bunch of these bugs.

In general, I would say that our use of "vulnerability" lines up with what jerrythegerbil calls "potential vulnerability". (In cases with a POC, we would likely use the word "exploit".) Our goal is to keep Firefox secure. Once it's clear that a particular bug might be exploitable, it's usually not worth a lot of engineering effort to investigate further; we just fix it. We spend a little while eyeballing things for the purpose of sorting into sec-high, sec-moderate, etc, and to help triage incoming bugs, but if there's any real question, we assume the worst and move on.

So were all 271 bugs exploitable? Absolutely not. But they were all security bugs according to the normal standards that we've been applying for years.

(Partial exception: there were some bugs that might normally have been opened up, but were kept hidden because Mythos wasn't public information yet. But those bugs would have been marked sec-other, and not included in the count.)

So if you think we're guilty of inflating the number of "real" vulnerabilities found by Mythos, bear in mind that we've also been consistently inflating the baseline. The spike in the Firefox Security Fixes by Month graph is very, very real: https://hacks.mozilla.org/2026/05/behind-the-scenes-hardenin...

epistasis•30m ago

I'm not a security dev or researcher or anything, but as an outsider my understanding matches how Mozilla uses the terms. Though words used by specialists and the general public can offer differ...

paulvnickerson•29m ago

What types of vulnerabilities was it finding? Cross site scripting, privilege escalation, etc? Mostly memory corruption or any Javascript logic bugs?

IainIreland•9m ago

I work on SpiderMonkey, so I mostly looked at the JS bugs. It was a smorgasbord of various things. Broadly speaking I'd say the most impressive bugs were TOCTOU issues, where we checked something and later acted on it, and the testcase found a clever way to invalidate the result of the check in between.

If you look closely at, say, this patch, you might get a sense of what I mean (although the real cleverness is in the testcase, which we have not made public): https://hg-edge.mozilla.org/integration/autoland/rev/c29515d...

crummy•28m ago

Curious if people think LLMs will lead to more secure or less secure software in five years.

int32_64•24m ago

Both. The skilled will use them to find problems, the unskilled will use them to slopcode insecure software the skilled will have to fix.

stavros•14m ago

That depends on which side has more money.

deferredgrant•23m ago

A vuln finder is useful only if it respects the humans on the other end. Every bogus report taxes the same scarce attention needed for the real bugs.

Dirtyfrag: Universal Linux LPE

Canvas (Instructure) LMS Down in Ongoing Ransomware Attack

The Burning Man MOOP Map

Agents need control flow, not more prompts

Natural Language Autoencoders: Turning Claude's Thoughts into Text

AlphaEvolve: Gemini-powered coding agent scaling impact across fields

AI slop is killing online communities

DeepSeek 4 Flash local inference engine for Metal

OpenClaw Had a Rough Week

Draw Marc Andreessen on an Egg

Two Home Affairs officials suspended after AI 'hallucinations' found

I want to live like Costco people

Chrome removes claim of On-device Al not sending data to Google Servers

Rolling the Root Key

Colored Shadow Penumbra

Principles for agent-native CLIs

Easy Random Trees

PySimpleGUI 6

Creating for a niche

RaTeX: KaTeX-compatible LaTeX rendering engine in pure Rust

Child marriages plunged when girls stayed in school in Nigeria

The Self-Cancelling Subscription

OpenBSD Stories: The closest thing to cute kittens (OpenBSD/zaurus)

OurCar: What I learned making an app for my family

Show HN: TRUST – Coding Rust like it's 1989

Boris Cherny: TI-83 Plus Basic Programming Tutorial (2004)

I switched from Mac to a Lenovo Chromebook

GovernGPT (YC W24) Is Hiring Engineers to Build Thinking Systems in Montreal

Mozilla says 271 vulnerabilities found by Mythos and "almost no false positives"

ZAYA1-8B matches DeepSeek-R1 on math with less than 1B active parameters

Mozilla says 271 vulnerabilities found by Mythos and "almost no false positives"

Comments

Dirtyfrag: Universal Linux LPE

Canvas (Instructure) LMS Down in Ongoing Ransomware Attack

The Burning Man MOOP Map

Agents need control flow, not more prompts

Natural Language Autoencoders: Turning Claude's Thoughts into Text

AlphaEvolve: Gemini-powered coding agent scaling impact across fields

AI slop is killing online communities

DeepSeek 4 Flash local inference engine for Metal

OpenClaw Had a Rough Week

Draw Marc Andreessen on an Egg

Two Home Affairs officials suspended after AI 'hallucinations' found

I want to live like Costco people

Chrome removes claim of On-device Al not sending data to Google Servers

Rolling the Root Key

Colored Shadow Penumbra

Principles for agent-native CLIs

Easy Random Trees

PySimpleGUI 6

Creating for a niche

RaTeX: KaTeX-compatible LaTeX rendering engine in pure Rust

Child marriages plunged when girls stayed in school in Nigeria

The Self-Cancelling Subscription

OpenBSD Stories: The closest thing to cute kittens (OpenBSD/zaurus)

OurCar: What I learned making an app for my family

Show HN: TRUST – Coding Rust like it's 1989

Boris Cherny: TI-83 Plus Basic Programming Tutorial (2004)

I switched from Mac to a Lenovo Chromebook

GovernGPT (YC W24) Is Hiring Engineers to Build Thinking Systems in Montreal

Mozilla says 271 vulnerabilities found by Mythos and "almost no false positives"

ZAYA1-8B matches DeepSeek-R1 on math with less than 1B active parameters