If Claude Fable stops helping you, you'll never know

https://jonready.com/blog/posts/claude-fable5-is-allowed-to-sabotage-your-app-if-youre-a-competitor.html

202•mips_avatar•1h ago

Comments

mips_avatar•1h ago

I'm really uncomfortable with these changes, like everything Anthropic's doing as "frontier research" today will be regular product engineering in a year.

tuggi•1h ago

It’s very frustrating…

mips_avatar•1h ago

Like if you hired a different services company who decided to sabotage your business that would be fraud.

Guillaume86•52m ago

The EU could/should probably legislate against this, it's bonkers...

varispeed•41m ago

It's probably already illegal, but given many government already use Anthropic models, they cannot really get the company to court.

numpad0•1h ago

I don't understand how businesses could trust cloud LLMs going forward with this ongoing "safety" paranoia. Building dependence on them doesn't feel like a sane strategic decision for users.

cubefox•57m ago

It's not paranoia. Cyber attacks have gone up massively in the past few months even with the weaker models we had so far. And Claude Mythos 5 scores even higher than the unreleased Mythos Preview on ExploitBench. If you made this capability publicly available you would see another acceleration of cyber attacks.

extr•54m ago

This isn't even about cyber attacks. This is just LLM development which is increasingly just called software development. And at least for cyber it says "Sorry I can't help with that"!

forshaper•52m ago

Looking better and better for people to go after local solutions.

mcmcmc•38m ago

Tell that to the GPU market

hedora•4m ago

I think it heard. A 128GB strix halo was $1400 at launch. Now they’re $3299.

That 7 months of claude -> 16.5 months of claude.

variety8675•1h ago

It is absolutely fine to distill the IP of everyone else, but you'd be violating the TOS to distill ours :)

mips_avatar•1h ago

Fine for me. Not for thee

anematode•57m ago

It's utterly bonkers. Hopefully the model weights get leaked. Then we can claim it's public domain or, at the very least, distill it and then release it for free.

david_shi•38m ago

Is there a technical term for this phenomenon? Ladder pulling?

https://blog.google/innovation-and-ai/technology/safety-secu...

cyanydeez•28m ago

"Capitalism"

ashleyn•26m ago

I believe the term is "hypocrisy."

matt_daemon•

cute_boi•1h ago

I tried today and it gave cybersecurity error on base64 implementation. It is so nerfed....

mips_avatar•1h ago

At least it gave an error! This whole silent nerfing idea is so wrong

iLoveOncall•1h ago

At this point you're criminally incompetent if you still feed your proprietary data and code to AI labs.

They legally can steal it all and now you can't use the product of this theft to improve your own systems.

thot_experiment•1h ago

It's a SaaS, when in the history of SaaS has it ever been a good idea to trust that the company won't ruin the product under you?

booi•1h ago

I think there's a pretty big difference here. It's not like Github prevents you from building a Github competitor. Or Linear is preventing you from using it to build a Linear competitor.

This is more akin to Windows somehow preventing you from building a new OS.

Or worse yet, sabotaging vs preventing.

semiquaver•58m ago

A surprising number of companies do include “you may not use the service we provide you to compete with us” in their terms of service.

After a quick search the best example is Atlassian. It would (apparently, IANAL) break terms to plan a JIRA competitor using JIRA.

  > Customer must not (and must not permit anyone else to): [...] (d) use the Products to develop a similar or competing product or service

https://www.atlassian.com/legal/atlassian-customer-agreement

Also Salesforce. Their competitors are explicitly disallowed from using any of their services for any reason.

  > SFDC’s direct competitors are prohibited from accessing the Services, except with SFDC’s prior written consent.

https://www.salesforce.com/en-us/wp-content/uploads/sites/4/...

trhaynes•29m ago

Perhaps provide an example or two?

Ifkaluva•1h ago

I guess an uncharitable way to read this might be “the ML engineers/scientists want to automate all of the jobs except their own.”

throwaway89864•53m ago

Insta-job security.

afavour•53m ago

The charitable read is that their restrictions for "safety" (i.e. what's separating Fable from Mythos) makes this inevitable. If you could just make your own Mythos it would circumvent the protection.

Which kinda just highlights how weird this situation is.

cyanydeez•17m ago

"Haves" and "Havenots" is how they should be calling, init

pablogancharov•1h ago

“When you realize the goal is the path, the pursuit itself becomes the prize. Stones in the road are not obstacles blocking your path; they are the path”

now I understand distillation is much more important thank I thought

CrankyBear•1h ago

"Claude can now be silently nerfed. Anthropic has decided it won't tell users when this happens." W T F!!

noncoml•1h ago

Disillusioned CEOs convincing themselves they have the mandate and right to define morality for everyone else. They get to decide what is right, wrong, permissible, or dangerous from the top, in the name of "safety". This is corporate nannying.

themaninthedark•46m ago

You just have to force behavior...

https://youtube.com/shorts/QmGGUnZNqv4?si=Q4CsGsYMvR02vay8

miroljub•46m ago

It's dangerous when personal moral and religious beliefs of company leadership leaks into the product itself and get force fed upon customers.

maipen•22m ago

careful there cowboy, we are in the golden age of ai, regulation is still catching up.

You don't want to sell guns to people without some sort of background check. The amount of exploits found in the last few months have been pretty scary already.

This is just one more layer of caution, because it reveals how little we know how these llms work. They know how to make them, but they seem to be unable to properly restrain them.

__natty__•56m ago

This makes Fable unusable for me. If I cannot tell whether I am paying for the whole service or just a partial one, because somehow their guardrails have decided my work silently broke their terms of service, then I prefer to go to older models or alternatives

maxall4•47m ago

As someone who works in bioinformatics, and, as such, does a great deal of machine learning, this makes Fable unusable for me as well.

flexagoon•34m ago

Fable would be unusable for you in a more literal way, since it just directly refuses to answer any query even remotely related to biology

maxall4•27m ago

I’m very aware of this as well.

varispeed•44m ago

I am sure they've been doing that with Opus. I am getting mixed results all the time.

gowld•56m ago

> If Claude gives me poor or incorrect advice while I’m working on an AI component, I have no way of knowing whether the model was confused, whether my problem is unsolvable, or if some invisible policy restriction quietly kicked in. Anthropic has explicitly chosen not to tell users when this is happening.

That's always been the case with corporate LLMs.

chroma_zone•36m ago

Minus the policy restrictions, this has always been true for all LLMs in general.

extr•56m ago

I'm a big fan of Anthropic. Just check my post history. I've been accused of working there. But this is complete bullshit and they need to get real. Silent sandbagging is not acceptable, especially given they've shown with this release their safety filters have HUGE amounts of false positives.

zzleeper•21m ago

It's increasingly obvious that the only safeguard we got is open models and semi open ones like from China. Crazy world

comboy•55m ago

I'm fairly certain they were doing something similar already possibly with some quantizations and not for the good humanity but just trying to handle the increased usage. Not for API requests though, just subscription CLI usage.

Anvoker•55m ago

This kind of opacity is unacceptably user hostile. It's not okay to treat some amount of developers as acceptable casualties, without them even knowing, in order to help enforce a restriction that only serves Anthropic's interests. And if you want to tell me this is for managing the x-risk factor, I'm frankly unimpressed.

somesortofthing•53m ago

This is a fun peek into the economic implications of RSI/ASI. Because it's so infinitely valuable that it basically destroys all markets, labs will eventually do stuff like stop releasing models completely and skipping out on contracted commitments because they'll have the power to just drive their competitors out of business before the legal battle gets expensive.

Cloud providers - at first smaller ones, then the hyperscalers - will follow suit, completely closing sales to anyone but the labs and demanding payment in equity/direct decision-making power rather than cash. There's no particular reason why the inference/training split has to be 80/20, and no amount of willingness to pay can help you in an event that turns your money worthless.

platinumrad•30m ago

Nothing is infinitely valuable.

windexh8er•12m ago

Especially when you can actively choose to not use Anthropic. They think they have a moat from all of the IP they've stolen. Just wait until there's nothing more to steal and the laws eventually turn against them. And let's be honest about these companies. It is very much Dario and Sam and Sundar and Mark and Peter and Elon and... These are the choices they are making and hopefully they are held accountable both legally and within society as a whole.

trilogic•51m ago

https://huggingface.co/Trilogix1/Hugston-Nex-N2-Pro-gguf

darkbatman•50m ago

This is crazy and would be frustrating, I probably would just be using another model as authority and keep fable as reviewer only in this case.

derac•50m ago

Is there some consumer protection law around this?

antaviana•49m ago

It seems we now have a new product category, HaaS, Hallucination as a Service.

hmokiguess•48m ago

I'm sure someone is gonna be able to jailbreak, abliterate, or equivalent, on this input moderation attempt they have going on.

torben-friis•48m ago

They have a silent nerfing system for their models and say so openly. The obvious question is how much it is being used already.

Competitor companies being nerfed?

Non Americans getting worse code?

Punishing and rewarding users to maximize engagement, like online games do affecting victories through matchmaking?

cyanydeez•7m ago

$$$$$$: no nerf $$$$: a little nerf $$$: more nerf $$: are you poor? $: be permanent underclass

notrealyme123•1m ago

This send chills down my spine. For now I will not use Fable in my research. The risks of being sabotaged by the model are not worth it.

Avicebron•47m ago

Can't you just switch the toggle that says "switch models when a message is flagged"? I turned mine off in case anything does get flagged I will know..

For now, I'm really not happy about this limited rollout and then turning off. That's probably the most egregious thing I think Anthropic has done recently

platinumrad•27m ago

This is a separate mechanism. The user is not notified about the flagging and rather than redirecting to a weaker model, the response is intentionally sabotaged.

It's user-hostile to the point of parody.

Avicebron•13m ago

I stand corrected. That sucks. A lot.

CamperBob2•47m ago

We’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building ... distributed training infrastructure ...)

What an interesting thing to call out as a threat. Hmm.

mystraline•46m ago

I have never ever trusted "corporate ethics".

Theres no ethical framework. No axioms. Its a mixture of legal, political, and public-facing 'rules'. And what are the rules? Youre not permitted to know.

"We reserve the right to lie about the models we provide, silently downgrade you, and give you blatant misinformation cause you triggered our unstated rules... BUT we'll still use your token budget with lots of thinking and waste your money."

No, folks. Seriously, local LLMs are where its at. You can run the model YOU want, on your hardware, with no data exfiltration.

And with tools like Krasis that can synthesize nvidia ram and system ram as unified-ish memory, makes doing Local LLMs absolutely foable, now!

varispeed•46m ago

That's what I observed with Opus. This is probably a lawsuit going to happen because you pay for tokens and you expect to get performance you pay for, instead you never know if the model suddenly become dumb and your whole session has to be started again.

mrinterweb•45m ago

It kind of sucks, but I get the silent change. If a user was trying to use the model for something untoward, having a rejected prompt would just give signal to train on how to eventually successfully bypass security measures.

m_krebs•39m ago

this is probably overstating their abilities at present - I am experimenting with Fable on a completely benign personal application and I am constantly hitting the "cybersecurity and biology topics" guardrail

BoorishBears•37m ago

"Anthropic says these safeguards only affect 0.03% of developers. Maybe that's true today."

I don't think it's true today. It's like when schools mention "average class size", where that average is dominated by classes with like 2 students instead of classes with 100.

Much more honest would be the percentage of developers who previously used their models for the model development tasks they're targeting, but it actually looks like they're saying 100% of them are affected based on the language around it "always having been prohibited".

So awful.

exabrial•36m ago

New frontier in anti-competitive practices.

tempestn•35m ago

You should be able to know if your problem was solvable by using your own expertise and judgement, no? If you're relying on LLMs as a substitute for those, I wouldn't expect great results.

atleastoptimal•34m ago

There is a possibility this may not end at simply nerfing the model. The idea of manipulating the behavior of a model depending on the prompt given to it can extend to

1. Detecting if employees from competing companies are using it and sabatoge their work, even not LLM-training related

2. Direct users to outcomes that would justify higher compute spend. Deliberately coding a project to 95% completion but designed to be losing a critical step right before one's weekly rate limit is expended

3. Reduce the quality of writing when a person is writing an essay where the argument is against the interests of the model company, or steering the user using the model for brainstorming in a direction which causes them to waste time or abandon their train of reasoning

etc. etc. The possibilities are enormous. Many people use AI daily for their job, personal advice, companionship. A model company that steers the behavior of the model towards a deliberate outcome could develop a controlling interest in human behavior and productivity at large, even with subtle influence would compound enormously over its millions of users.

mickdarling•32m ago

No, this is their get out of jail free card if people start complaining about the model being dumb or forgetful or lying, they can just say, oh well, you must have been doing something that triggered its distillation prevention technique.

And, they can say that for anybody at any time, and you'll never know why, and there's no way to prove it.

Everyone needs a flight data recorder to prove... "here's what I was actually doing and why it was not distillation." And now you're having to prove your innocence instead of them having to prove you're guilty, and really at the end of the day, it's just the model being stupid that they're protecting themselves from.

jkxyz•30m ago

"To effectively contain a civilization’s development and disarm it across such a long span of time, there is only one way: kill its science." - Cixin Liu, The Three-Body Problem

This immediately made me think of the Sophons silently manipulating the sensors of particle accelerators to prevent humanity from developing advanced knowledge of particle physics.

delichon•7m ago

The level of oppression necessary to get software geeks to stop making progress on AI is similar to that necessary to get Ukrainian geeks to stop making progress on drones.

mike-cardwell•22m ago

I spend a lot of time telling Opus 4.8 to search for security bugs in the code it wrote, and it spends a lot of time finding them, and then fixing them. Fable wont let me fix the security issues that Opus 4.8 created.

cayley_graph•18m ago

Intentionally and silently sabotaging work done with Claude whenever Anthropic decides it is appropriate is unacceptable behavior, and comically tone deaf given the state of open models. Why on earth would I ever pay for a malicious product?

hbarka•17m ago

I think this is a bit hyperbolic. Fable will fall back to Opus.

SwellJoe•13m ago

The moat looks deep today but it's going to become more shallow every year.

Training a new model from scratch takes serious resources. Post-training/fine-tuning an existing model, dramatically less. The knowledge for the process was esoteric two years ago, now you can ask a current model (one of several) to walk you through it, while building the tools to do it as you go. Several of my recent weekend projects have been exactly that sort of thing, just so I understand it better. "Let's make a LoRA", "let's generate a corpus of training data for fine-tuning a model for X task", "how can I put my face in a text-to-image model?" stuff like that. All of this is do-able on kinda modest local hardware (a couple of old GPUs or a Strix Halo or DGX Spark or big Mac Studio), or for a few bucks or a few hundred bucks or a few thousand bucks of cloud compute, depending on scale.

Scale that up to corporate or startup scale, with the money that's been flowing into AI for the past couple/few years, and it's obviously there's going to be a lot of competition just as the top model makers need to start ringing the cash register. That's a lot of opportunities for people to look at their ballooning Claude usage costs and find other ways to do the same thing for drastically less money. $100/month or $200/month is a no-brainer for Claude Code with probably the best model for coding, but they're pushing more users to usage-based billing which becomes cost-prohibitive real fast.

So, they desperately need to continue to be among the only ways to solve the hardest problems, and they need the alternatives to cost a similar amount. They can count on OpenAI and Google to ratchet up prices, too. They probably can't count on everybody, especially the vendors in China with different economics, to do it. And, they can't count on companies to look at their own usage and not ask, "Can we train a smaller specialist model that does this one thing we're using the Anthropic API most heavily for?"

I'm hoping they just mean stuff like using Claude for distillation by e.g. Chinese model makers, and not "how do I fine-tune Gemma 4 to write more like me?" or whatever.

hedora•8m ago

What moat? There are multiple companies providing pareto-optimal frontier models, and it takes O(10) people to build one of these things.

The rest is capital intensive, and the price will approach the cost of production over time.

Thinking this is a profitable endeavor is equivalent to claiming coal plants have good margins because boilers are expensive.

andrewchambers•11m ago

So this is what 'alignment' looks like to them.

nharada•9m ago

Imagine if Github said "if we detect you're building a competitor to Github, we will silently degrade the results of your CI actions so that tests sometimes randomly fail"

lelanthran•8m ago

I bet it's more a case of trying to cut down the competition so that there is not a large distillation just before they IPO.

Everything the large LLM providers do now, I view it through the lens of "how does this impact their IPO".

idle_zealot•6m ago

I currently have Fable set on cleaning up the work of smaller models to bring my code up to standards I'd feel comfortable developing on manually. Y'know, for when they decide I don't get to use it anymore.

greatgib•5m ago

Imagine if code editors were created by greedy **** behaving as Anthropic, and it would not have been allowed to create other code editors using an existing code editor. Or even better, you couldn't use Bash, zsh, ... to create another cli prompt input tool like Claude Code...

Artoooooor•4m ago

It is as if Jetbrains told that "you can't use IntelliJ Idea to develop frontier IDE. We can introduce slight compilation errors if we detect you doing so".

mips_avatar•3m ago

Thats exactly it

throwawayffffas•1m ago

> we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design).

Dig that moat son, we would want to automate our job away.

Claude Fable 5

Upcoming breaking changes for NPM v12

Ultrafast machine learning on FPGAs via Kolmogorov-Arnold Networks

Making Graphics Like it's 1993

If Claude Fable stops helping you, you'll never know

Grit: Rewriting Git in Rust with Agents

Test-case reducers are underappreciated debugging tools

Exif Smuggling

A giant star may have destroyed itself in one of the rarest explosions

Microsoft's open source tools were hacked to steal passwords of AI developers

CEOs Who Think AI Replaces Their Employees Are Just Bad CEOs

Show HN: Resonate – Low-latency, high-resolution spectral analysis

Flat Datacenter Networks at Scale at Amazon

OpenCV 5 Is Here: The Biggest Leap in Years for Computer Vision

Let's Encrypt bans certificate usage in any US sanctioned territory [pdf]

The LD_DEBUG environment variable (2012)

Apple decided not to roll out Siri in EU after denied request for exemption

Launch HN: Transload (YC P26) – Measuring freight items with CCTV

FCC wants to kill burner phones by forcing telecoms to get all customers' IDs

Biff.core: system composition for Clojure web apps

Alpine Linux 3.24.0 Released

Company Will Add Phone, AirPod, and Smartwatch Trackers to ALPRs

Is Grep All You Need? How Agent Harnesses Reshape Agentic Search

Show HN: Gravity – interactive solar-system simulator, from Newton to Einstein

The iPhone's Last Stand?

Ask HN: Are you still using a Vision Pro?

Blaise v0.10.0: Native Back End, Threads and Incremental Compilation

Emerge Career (YC S22) Is Hiring a Founding Growth Marketer

Show HN: GentleOS – A pair of hobby OSes for vintage 32-bit and 16-bit PCs

Can LLMs Beat Classical Hyperparameter Optimization Algorithms?

If Claude Fable stops helping you, you'll never know

Comments

Claude Fable 5

Upcoming breaking changes for NPM v12

Ultrafast machine learning on FPGAs via Kolmogorov-Arnold Networks

Making Graphics Like it's 1993

If Claude Fable stops helping you, you'll never know

Grit: Rewriting Git in Rust with Agents

Test-case reducers are underappreciated debugging tools

Exif Smuggling

A giant star may have destroyed itself in one of the rarest explosions

Microsoft's open source tools were hacked to steal passwords of AI developers

CEOs Who Think AI Replaces Their Employees Are Just Bad CEOs

Show HN: Resonate – Low-latency, high-resolution spectral analysis

Flat Datacenter Networks at Scale at Amazon

OpenCV 5 Is Here: The Biggest Leap in Years for Computer Vision

Let's Encrypt bans certificate usage in any US sanctioned territory [pdf]

The LD_DEBUG environment variable (2012)

Apple decided not to roll out Siri in EU after denied request for exemption

Launch HN: Transload (YC P26) – Measuring freight items with CCTV

FCC wants to kill burner phones by forcing telecoms to get all customers' IDs

Biff.core: system composition for Clojure web apps

Alpine Linux 3.24.0 Released

Company Will Add Phone, AirPod, and Smartwatch Trackers to ALPRs

Is Grep All You Need? How Agent Harnesses Reshape Agentic Search

Show HN: Gravity – interactive solar-system simulator, from Newton to Einstein

The iPhone's Last Stand?

Ask HN: Are you still using a Vision Pro?

Blaise v0.10.0: Native Back End, Threads and Incremental Compilation

Emerge Career (YC S22) Is Hiring a Founding Growth Marketer

Show HN: GentleOS – A pair of hobby OSes for vintage 32-bit and 16-bit PCs

Can LLMs Beat Classical Hyperparameter Optimization Algorithms?