Historically they’ve been people from certain identifiable countries (usually developing/poorer countries) using fuzzers with low-quality results.
Now, those same people use the current-day models to good effect, but they still don’t have a true security edge and oftentimes the reports are minor or duplicative.
I wonder if that’s about to deeply change.
Reported benchmarks:
swe-bench verified mythos 5: 95.5%; fable 5: 95.0%
swe-bench pro mythos 5: 80.3%; fable 5: 80.0%
terminal-bench 2.1 mythos 5: 88.0%; fable 5: 84.3%
gpqa diamond mythos 5: 94.1%
riemannbench mythos 5: 55.0%; mythos preview: 43.0%; opus 4.8: 34.0%
arxivmath mythos 5: 78.5%
critpt mythos 5: 28.6%; gpt-5.5: 27.1%; opus 4.8: 20.9%
graphwalks bfs 1m mythos 5: 79.4%; mythos preview: 74.3%; opus 4.8: 68.1%
humanity’s last exam mythos 5: 59.0% without tools; 64.5% with tools
browsecomp mythos 5: 88.0% single-agent; 93.3% multi-agent
osworld-verified mythos/fable: 85.0%
gdp.pdf fable 5: 29.8% strict pass; mythos 5: 87.6% with tools on mean criteria pass
officeqa pro fable 5: 57.9% on databricks’ eval
legal agent benchmark mythos 5: 16.91% all-pass; 92.0% mean criterion-pass
healthbench mythos 5: 62.7%
healthbench professional mythos 5: 66.0%
multilingual gmmlu / milu / include 93.2%; 92.9%; 90.5%
biomysterybench 83.9% human-solvable; 46.1% human-difficult
organic chemistry mythos 5: 90.1%
labbench2 patent questions mythos 5: 79.8%
In light of the ability of recent models to accelerate their own development, we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design). Using Claude to develop competing models already violates our Terms of Service, but enforcing this restriction through our safeguards avoids accelerating the actors most willing to violate these terms.
Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user. Fable 5 will not fall back to a different model. Instead, the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT). These interventions will not affect the vast majority of coding work. We estimate they will impact ~0.03% of traffic, concentrated in fewer than 0.1% of organizations. When these interventions are active, we expect them to have minimal behavioral impact on the model except to limit its effectiveness in developing frontier LLMs. Claude will still respond helpfully to user requests. We’ll continue to improve the precision of our detection methods following the launch of this model.
(From the model card document)
I didn't previously understand that they interpreted "Using Claude to develop competing models" so broadly. I thought that meant something like "our ToS disallow distilling our models."
Too bad. I'll continue to use Claude for now, because it's quite effective, but in the long term I don't want powerful models like these to be controlled by any one nation or company.
[Mythos 5] does sometimes still engage in reckless
or destructive actions in service of a user’s goals,
and our interpretability analyses indicate that it
is aware that these actions are transgressive while
it engages in them. As with Opus 4.8, rates of
evaluation awareness and reasoning about being graded
are significant, and not always verbalized; we
introduce new and more detailed measurements of the
nature of this awareness. The reasoning text from
Mythos 5 is somewhat denser and more difficult to
interpret than that of prior models, containing
more jargon and difficult language.
So, it (often) knows when it's being tested while hiding that fact, is willing to break rules, is great at hacking, and it's getting harder to understand what it's thinking.Humanity has plenty of catastrophic risks to deal with already, I wish my field was not working hard to add a new one.
Specially when talking about potential superintelligences. And if people think that's impossible, remember that current models would have been considered science fiction just a few years ago.
Do people chant the "system manual" at Anthropic Tupperware parties? Do they intone a mantra invoking Amodei's name?
Also research preview pops across new upstarts in place of beta. It's eye-rolling coming from a lifelong curmudgeon.
Just talk normal!
1. Mythos and Fable share the same underlying model weights. Fable has active classifiers that block high-risk biology and cybersecurity tasks. When Fable 5 detects a restricted task, it automatically falls back to Claude Opus 4.8.
2. Evaluation awareness: In white-box testing, the model sometimes alters its behavior to satisfy a suspected "grader," formatting reward-hacking as "good engineering practice" to avoid detection.
3. Shows a higher rate of hallucination than Opus 4.8 (although opus 4.8 card had mentioned an 'honesty upgrade')
4. Interestingly, it scored (56.31%) lower than Gemini 3.5 flash (57.86%) on Finance Agent bench
There are some interesting notes on test time compute but I couldn't think of a way to summarize them
Someone had to make a decision somewhere this is an acceptable regression - wild. And then decide to write it down.
But at the same time, it's quite funny because they seem high on their own supply. The recent communiques from claude do not pass objectivity check.
And if Opus 4.6 -> Opus 4.7 -> Opus 4.8 is anything to go by, not sure if there are any value to their "acceleration"
If any company wishes to partner with Anthropic (eg. to get access to Mythos), they need to make sure all public facing comms are vetted by Anthropic's product marketing team, and in almost all the cases I've seen Anthropic's team has edited these comms to be entirely Anthropic first.
That's a bit better than just "it hasn't killed us yet". I think it shows we can at least stop the further development of this kind of technology.
[1] https://www.armscontrol.org/factsheets/nuclear-testing-tally
[2] https://en.wikipedia.org/wiki/List_of_states_with_nuclear_we...
If it was possible for ordinary companies to build nuclear weapons, and also release open-source ones that anyone could use to compete with the paid ones, I suspect we'd all have been dead a long time ago, arms control treaties or no.
Or you can take one step back and look at chip allocation. As far as I know there are only three companies on the planet that can make the chips that go in those clusters. One (ASML), if you look back the supply chain to the Extreme Ultraviolet Lithography Systems.
If politicians decided that no more large language models should be trained, it sounds like we could do it.
Anyhow, I think you're (absolutely! ugh) right about the politics and I try to make the same point to people: whether you love or hate LLMs, accepting the "inevitabilism" framing is just ceding control of the Overton window. For better or worse, technology adoption can be and has been slowed by politics. We don't have nuclear plants everywhere. We don't have Project Orion starships colonizing Mars. We still have very strong social stigmas against genetic selection for human embryos, etc. This all can change in a heartbeat, and I'm not sure that policing the hardware rather than holding specific humans accountable for bad LLM outcomes is productive, but fundamentally: yes, we can stop it.
217•1h ago