Assessing Claude Mythos Preview's cybersecurity capabilities

https://red.anthropic.com/2026/mythos-preview/

134•sweis•1h ago

Comments

AntiDyatlov•1h ago

A very good outcome for AI safety would be if when improved models get released, malicious actors use them to break society in very visible ways. Looks like we're getting close to that world.

sourcecodeplz•1h ago

Gives me Fight Club vibes.

pants2•7m ago

It would certainly be good news for cybersecurity employment!

awestroke•1h ago

This is becoming a bit scary. I almost hope we'll reach some kind of plateau for llm intelligence soon.

websap•1h ago

If we don't innovate, someone else will. This is the very nature of being a human being. We summit mountains, regardless of the danger or challenge.

vonneumannstan•54m ago

>If we don't innovate, someone else will.

Terrible take. You don't get to push the extinction button just because you think China will beat you to the punch.

>This is the very nature of being a human being. We summit mountains, regardless of the danger or challenge.

No, just no... We barely survived the Cold War, at times because of pure luck. AI is at least as dangerous as that, if not more. We have far exceeded our wisdom relative to our capabilities. As you have so cleanly demonstrated.

esafak•17m ago

We need to promote alignment and other ethics benchmarks; we can't change what we don't measure. I don't even know any off the top of my head.

hibikir•4m ago

On a topic like cybersecurity, we never win by not looking: One needs top of the line knowledge of how to break a system to be able to protect it. We have that dilemma dealing with human experts: The same government sponsored unit that tells you that you need to update your encryption can hold on to the information and use it to exploit it at their leisure.

Given that it's absolutely impossible to stop people not aligned with us (for any definition of us) from doing AI research, the most reasonable way forward is to dedicate compute resources to the frontier, and to automatically send reasonable disclosures to major projects. It could in itself be a pretty reasonable product. Just like you pay for dubious security scans and publish that you are making them, an LLM company could offer actually expensive security reviews with a preview model, and charge accordingly.

lebovic•1m ago

A plateau is unlikely, at least for cybersecurity. RL scales well here and is replicable outside of Anthropic (rewards are verifiable, so setting up the training environment doesn't require that much cleverness).

The post also points out that the model wasn't trained specifically on cybersecurity, and that it was just a side-effect – so I think there's still a lot of headroom.

It's scary, but there's also some room for cautious non-pessimism. More people than ever can cause billions of dollars of damage in attacks now [1], but the same tools can be used for defensive use. For that reason, I'm more optimistic about mitigations in security vs. other risk areas like biosecurity.

[1]" https://www.noahlebovic.com/testing-an-autonomous-hacker/

staticassertion•1h ago

I'd love to see them point at a target that's not a decades old C/C++ codebase. Of the targets, only browsers are what should be considered hardened, and their biggest lever is sandboxing, which requires a lot of chained exploits to bypass - we're seeing that LLMs are fast to discover bugs, which means they can chain more easily. But bug density in these code bases is known to be extremely high - especially the underlying operating systems, which are always the weak link for sandbox escapes.

I'd love to see them go for a wasm interpreter escape, or a Firecracker escape, etc. They say that these aren't just "stack-smashing" but it's not like heap spray is a novel technique lol

> It autonomously obtained local privilege escalation exploits on Linux and other operating systems by exploiting subtle race conditions and KASLR-bypasses.

I think this sounds more impressive than it is, for example. KASLR has a terrible history for preventing an LPE, and LPE in Linux is incredibly common. Has anything changed here? I don't pay much attention but KASLR was considered basically useless for preventing LPE a few years ago.

> Because these codebases are so frequently audited, almost all trivial bugs have been found and patched. What’s left is, almost by definition, the kind of bug that is challenging to find. This makes finding these bugs a good test of capabilities.

This just isn't true. Humans find new bugs in all of this software constantly.

It's all very impressive that an agent can do this stuff, to be clear, but I guess I see this as an obvious implication of "agents can explore program states very well".

edit: To be clear, I stopped about 30% of the way through. Take that as you will.

rfoo•53m ago

> Mythos Preview identified a memory-corruption vulnerability in a production memory-safe VMM. This vulnerability has not been patched, so we neither name the project nor discuss details of the exploit.

Good morning Sir.

> Has anything changed here? I don't pay much attention but KASLR was considered basically useless for preventing LPE a few years ago.

No. It's still like this. Bonus point that there are always free KASLR leaks (prefetch side-channels).

But then, this thing is just.. I don't have a word for this. Just randomly read paragraphs from the post and it's like, what?

staticassertion•52m ago

Oh, that. That's true, I didn't know Mythos found that one. I guess I will not comment further on it until there's a write up (edited out a bit more).

> It is easy to turn this into a denial-of-service attack on the host, and conceivably could be used as part of an exploit chain.

So yeah, perhaps some evidence to what I'm getting at. Bug density is too low in that project, it's high enough in others. I'll be way way way more interested in that.

> But then, this thing is just.. I don't have a word for this. Just randomly read paragraphs from the post and it's like, what?

I read about 30% and got bored. I suppose I should have been clearer, but my impression was pretty quickly "cool" and "not worth reading today".

rfoo•39m ago

> I read about 30% and got bored.

I was lucky then :) Somehow I saw this first. And then the "somewhat reliably writing exploits for SpiderMonkey" part, and then the crypto libraries part. Finally I wonder why is there a Linux LPE mini writeup and realized it's the "automatically turn a syzkaller report to a working exploit" part.

Now that I read the first few things (meh bugs in OpenBSD, FFmpeg, FreeBSD etc) they are indeed all pretty boring!

staticassertion•34m ago

If people want exploitable syzkaller reports, following spender is free!

LocalMind – Document intelligence that stays in Canada (Cloudflare Workers

Alternatives to the !important keyword

DFlash: Block Diffusion for Flash Speculative Decoding

Artemis astronauts refuse to talk to Trump [video]

Ask HN: How Handle Enterprise Billing?

How We Built a Slop-Proof AI Engineering Workflow

Show HN: Namejam – a Claude Code skill that finds available project names

Iranian hackers launching disruptive attacks at U.S. energy, water targets [pdf]

InariWatch – AI monitoring that fixes production errors and ships the PR

Writing an LLM from scratch, part 32i – Interventions: what is in the noise?

Show HN: Visualize token entropy in a tiny in-browser LLM

Karl Sims and Alexander Mordvintsev on Merging Technology and Biology (2025)

Rabbit Looks for Her Feet and Finds Feet, Which Is Somehow Worse

Safe rm to restrict file removals to be under specified dir

Cells for NetBSD: kernel-enforced, jail-like isolation

Over-the-Air Computation Uses Radio Interference to Crunch Data

Who is going to win the Office vs. Remote Work Debate?

Russia's Fancy Bear still attacking routers to boost fake sites

Agent Protocol Standardization: 11 IETF Drafts Competing, 1 Expires April 10

Anthropic: Alignment Risk Update: Claude Mythos Preview [pdf]

Oracle hires new CFO with $950K salary as thousands face layoffs

Sony Pictures Entertainment Layoffs Underway as Studio Refocuses

Peeking at the Earth

AI agents are scrambling power users' brains

Tech companies are axing roles. They're hiring some back as contractors

I built Warden – a free security CLI to catch malicious NPM packages

North Korea-linked operators are hunting NPM maintainers behind Fastify, Lodash

Show HN: A small neural net asks if physical law is inevitable for any observer

S3 Files and the changing face of S3

Mathematicians figured out the perfect espresso