Anthropic's Safety Superpower

https://stratechery.com/2026/anthropics-safety-superpower/

51•swolpers•1h ago

Comments

kordlessagain•46m ago

> To that end, I can certainly buy the case that Fable/Mythos is in fact more capable when it comes to identifying and exploiting security issues

This has been covered before: https://aisle.com/blog/ai-cybersecurity-after-mythos-the-jag... (https://news.ycombinator.com/item?id=47732020)

> Anthropic’s cautious roll-out was justified. The problem with publicly releasing models, however, is that guardrails can be jailbroken, and apparently that is exactly what happened shortly after the release

The future is unevenly distributed. Anthropic, and Amodie in particular, seem to be of the mind they can control a bit of the unknown using words. They are likely being guided by the very product they built. *AI CAN MAKE MISTAKES

That Project Glasswing bullshit reeks of it. Corporations have take control of our attention, our Internet, and now our thinking.

I say it's high time to take it back.

chasil•41m ago

(reposted)

As I understand it, ITAR regulations for export controls have just been applied to any form of Mythos. These are overseen by U.S. Departments of State and Commerce, and forbid foreign nationals from access to any form of Mythos, either within or outside the U.S.

Only U.S. citizens and immigrants that are holders of a "green card" may now access Mythos.

It appears that Anthropic does not have internal controls to implement these restrictions in any form, so the only option was to shut Mythos down.

Penalties for ITAR violation can reach ten years in prison and a million dollars per violation. (I can post a link to those details if there is any interest.)

As long as Anthropic is a U.S. company, there is no escaping this.

https://fortune.com/2026/06/14/how-a-warning-from-amazon-led...

khalic•27m ago

This is how the US gov does business now, capricious and vengeful.

Textbook retaliation for not letting them use an abliterated version of Claude in weapons systems.

This effectively renders any US closed model useless for any foreign company. Could happen to OpenAI, Google, etc. Too much of a risk to implement something that can be yanked out because the company didn’t behave the way they want.

Looks like it’s time for Kimi, Z, Deepseek to take the front row. They’ll catch up in a few months anyway. Kimi code 2.6 is crazy good

chasil•6m ago

Consider this quote from the main article...

"When you further combine this realization with the company’s pronouncements about AI’s ability to conduct all economic activity, you realize that Anthropic’s leadership effectively wants to have power over everything and everyone."

This is fearful stuff on all sides, and none of the people involved might realistically be able to navigate the danger.

WithinReason•17m ago

Could Anthropic relocate to a different country?

chasil•5m ago

Individuals can leave, but the company cannot transfer restricted intellectual property.

Europe has extradition treaties, so the U.S. can force anyone in Europe back to the U.S. for criminal indictment who demonstrates inappropriate possession of this technology.

eloisant•16m ago

I never really understood this "US person" restriction. There are 350M people in US, mostly citizens and green cards holders, surely some of them could be working for a foreign power.

cube2222•41m ago

Relatedly, I think it's worth noting that Anthropic models have consistently been top-scoring in BullshitBench[0], in a league of their own, really.

Not affiliated with the bench in any way, but I think it surfaces important differences between the behavior of the models from different labs.

TLDR: The benchmark is measuring pushback in response to nonsensical requests and questions, as opposed to going with it and hallucinating a nonsensical answer.

[0]: https://petergpt.github.io/bullshit-benchmark/viewer/index.v...

mcintyre1994•11m ago

TBH this is the main thing that made me start trusting Claude enough to actually find it useful, and I'm surprised other models haven't caught up. I assumed they had and I just wasn't aware because I'm not using them in the same way.

smackeyacky•37m ago

Perhaps they should consider leaving the US. Pretty clearly the descent into a corrupt autocracy is having real consequences.

Zealotux•31m ago

Does any other place have the infrastructure Anthropic requires to train their models and run inference?

ramon156•25m ago

No. If we cannot even have an EU CloudFlare, then we definitely do not have the infra for this kind of computing.

The EU options are not even close to what CF can do

eric8bits•21m ago

There are fortunately some initiatives and interesting developments in the European market. Take bunny.net for example. We have to start somewhere in Europe, right? Better late than never.

s_dev•15m ago

>EU CloudFlare

What limitations does bunny.net have?

re-thc•11m ago

> What limitations does bunny.net have?

A huge free tier (technically, none)

mcmcmc

thedreammachine•37m ago

The interesting part here is not whether Anthropic is right on safety, but that safety gives them a moral vocab for bold policy changes and platform power.

Peterz_shu•35m ago

This is the part where the USA and allied countries can gain a headstart from using such an overpowered model.

This only just shows how strong Mythos/Fable will be, once released to the public.

I'm guessing about 0.5 year till public.

ben_w•13m ago

> USA and allied countries

Doesn't this *exclude* allies countries?

botw44•28m ago

The whole thesis falls apart though. You can't be on your way to "power over everything" and get distilled into free Chinese models within months. Pick one.

The bottleneck is compute and data, not the model. That's why they could only gate it for a bit. The ITAR thing proves it: no nationality controls in place, so the only option was killing the whole thing. Not exactly what an all-powerful gatekeeper does.

olmo23•21m ago

> no nationality controls in place

Not for now, but how long before we have KYC regulations concerning LLMs?

thefounder•14m ago

That’s really what Dario wants. Let’s hope he doesn’t get it

keybored•13m ago

> Here’s the thing about these safety justifications: I think they work because, to Anthropic, they aren’t justifications. The company really believes that they are the only ones who believe in super intelligence, and thus are the only ones who are sufficiently concerned about the dangers. That excuses decision after decision, policy after policy, and confrontation after confrontation that, to people on the outside, look like a bizarre combination of cynicism and naiveté.

I really dislike this belief (that has at least been expressed here) by some that X is okay because they-really-believe-it. This has a real Road to Hell stank on it.

It is incredibly convenient when your predictions or supposed beliefs go south. Well, we really believed that we were doing it for the betterment of human kind. And we really believed that X was an existential threat that was inevitable in which case we had to step up and do it because we we the only good guy ideologues. So sorry but not sorry.

I also don’t care if commenters know rank-and-file on the inside that “really believe it” as well. Not for one second.

swalsh•11m ago

"they by extension think that only they should have final say over AI generally. When you further combine this realization with the company’s pronouncements about AI’s ability to conduct all economic activity, you realize that Anthropic’s leadership effectively wants to have power over everything and everyone."

That might be one of the most important points in the post. Very troubling.

How the Squamish struck gold in Vancouver

The World's Leading Deepfake Expert No Longer Trusts His Own Eyes

White House's export limits on Anthropic linked to concerns about Chinese access

I'm 15. I spent the last three years building a programming language

Belgian politicians would rather risk treason charges than trust engineers

Show HN: DJ Mix realtime local AI music with a controller; Magenta, Stable Audio

We Get AI Costs Under Control

ClassicQ – A QuakeWorld client for the 30th anniversary of Quake

Scientist creates 'mini‑universe' to measure time without a clock

AWS rolls the dice for faster, more efficient networking

USD by the Numbers

Programming the ZX Spectrum's Bitmap Display

How can I schedule work on a thread pool with low latency?

Upgrading my Apple II serial setup

Are we asking the right questions?

IISU Frontend final alpha (v0.0.7) released

When the Event Is the Prompt

Why Mechanical Sympathy? (2011)

Zhynx on X: "Your AI could die tomorrow." / X

New SDK Available for AmigaOS 4.1 Final Edition

Surpassing Frontier Performance with a Fusion of multiple models

Data-Parallel Thinking (2024) [pdf]

Prime Minister Sir Keir Starmer has announced a social media ban for under-16s

Hetzner Cloud prices are up 120%

Chrome had the highest idle CPU and RAM usage in a 5-browser benchmark

"They screwed us": Personality clashes sent Anthropic's models offline

New solid-state ACs promise a cool future. Scientists aren't so sure

Coding Agent Sandboxes Don't Solve Credential Authorization

Frontier teams are reinventing AI-native development

Manufacturer bloatware is finally becoming optional – and this app is proof