https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c3...
Now they want to pause AI because of "recursive self improvement".
Fool me once shame on you fool me twice...
For a small group of cyberdefenders and infrastructure providers, we’re also launching Claude Mythos 5. It’s the same underlying model as Fable 5, but with the safeguards lifted in some areas.2 Mythos 5 will initially be deployed through Project Glasswing, in collaboration with the US Government, as an upgrade to Claude Mythos Preview. It has the strongest cybersecurity capabilities of any model in the world. Soon, we intend to expand access to Mythos 5 through a broader trusted access program."
There's a quote from a METR report on page 52:
>We ran [Mythos 5] on 38 of our hardest software tasks, including tasks centered around R&D. [Mythos5] generally outperformed an early checkpoint of Claude Mythos Preview in these, including by succeeding on some tasks that had not been solved by any public model we have previously evaluated. However, we still observed the model occasionally failing to correctly interpret nuanced instructions in difficult tasks... Based on the available evidence, we believe [Mythos 5] is likely unable to fully and reliably automate R&D for frontier projects spanning multiple weeks. We believe that a better, more confident assessment would require more time, evaluations, and information from the model developer.
this is good news, right? right...?
* From today through June 22, Fable 5 is included on Pro, Max, Team, and seat-based Enterprise plans at no extra cost.
* On June 23, we’ll remove Fable 5 from those plans. Using it after that will require usage credits. If capacity allows, we’ll extend the included window.
* After this point—when sufficient capacity allows us to do so—we aim to restore Fable 5 as a standard part of subscription plans. We intend to do this as quickly as we can.
The "offer, then remove" aspect is a bit eyebrow-raising -- it feels like they are trying to get subscribers to switch to usage-based billing, which makes me wonder if we'll ever get it after that June 22nd window.
* Anthropic runs out of genre names.
* Anthropic changes the model naming convention.
* AGI is achieved and handles its own naming.
*/
Okay, how about Mythos?
>Increase it even more.
Right, then Cosmos.
>Even more!
Even more? Let's try Aeon.
>MORE, EVEN BIGGER
ALRIGHT, TRY OMEGAPANTHEON 7.8 THEN
Fable 5 Ti
- Opus 4.7 xhigh: 5.2%
- Opus 4.8 xhigh: 13.4%
- Fable 5 xhigh: 29.3%
Seems like a huge jump.
EDIT: Oh I see, this is the best link for pricing https://platform.claude.com/docs/en/about-claude/pricing
So the price is double across the board...
From their pricing page, Opus 4.8 costs $5 per million input tokens and $25 per million output tokens [1].
[1] https://platform.claude.com/docs/en/about-claude/models/over...
I would have expected Mythos to be much more expensive than just 2x current Opus (which is clearly cheaper to run than original Opus)
Obviously still need to verify it for myself to see if it's truely a leap.
But am I the only one wondering, "What can I do today that I couldnt do yesterday?"
Previously I would think "Oh I wonder if I can finally get it to do X now?"
However now I feel like yesterdays models were more that capable to handle nearly any engineering task I paired with it on.
Maybe this is the final leap where I can comfortable set up an autonomous coding loop? Maybe.
> - From today through June 22, Fable 5 is included on Pro, Max, Team, and seat-based Enterprise plans at no extra cost. > - On June 23, we’ll remove Fable 5 from those plans. Using it after that will require usage credits. If capacity allows, we’ll extend the included window. > - After this point—when sufficient capacity allows us to do so—we aim to restore Fable 5 as a standard part of subscription plans. We intend to do this as quickly as we can.
I really wonder what their compute layout is for this. My guess from my understanding is that they know how to restrict during peak times and are willing to do this. Meaning we expect not the most fast responses and they can delay the inference to not have the service be down. Then, if that delay time is too annoying for token payers, they're saying they should be allowed to remove cost by taking away the subscription users.
It's all a scam.
This seems like the pharmaceutical method of get them hooked on the drug with free samples, then once they can't live without it, raise the price. I'm not sure I want to start using Claude Fable on a max plan if it's just going to go away on June 23rd.
But maybe the more charitable reading is that they didn't have to offer this model at all on those plans and they are giving the standard free trial.
Limited "free" time is what game developers do if they want to stress test the infrastructure code until it breaks.
API Error: Server is temporarily limiting requests (not your usage limit) · Rate limited
Fable 5 default: https://gist.github.com/simonw/036bee5a703e7ec84e34efa974438...
Opus 4.8 (the "max" one is closest to Fable): https://simonwillison.net/2026/May/28/claude-opus-4-8/#and-s...
Now here are the Fable pelicans for all five of the thinking effort levels - low, medium, high, xhigh, max: https://tools.simonwillison.net/markdown-svg-renderer#url=ht...
Low used 25 input, 1,929 output - 9.67 cents: https://www.llm-prices.com/#it=25&ot=1929&sel=claude-fable-5
Max used 25 input, 14,430 output - 72.175 cents! https://www.llm-prices.com/#it=25&ot=14430&sel=claude-fable-...
if I get a harder challenge for it i'll jump up a model for planning until that its been solid.
I'm struggling to see the moat for these models. What's stopping a competitor or a Chinese lab fromr releasing a comparable one?
We've entered the phase where only companies will be able to afford state-of-the-art models.
People making high-end salaries can afford Fable for critical parts of their projects though.
if only the hyper wealthy can access the pure water that doesn't give you cancer while the rest of us drink from the Ganges river/sub-100iq models that drool and hallucinate/waste time, then I would say that's pretty terrible for the world. it'll just create extreme disparity in our world, far far worse than anything that exists today.
and you may think, man what a ridiculous example, but think about it this way: what happens when something like Mythos or some future model can actually solve your specific cancer (we're getting closer and closer), but is entirely impossible to afford? Or perhaps you need boosters that require the AI to create more of, and now you're reliant on a model that is too expensive.
Open source needs to save us all from this
appears to work
I kind of wonder, though, which model they’re using to do the routing. It seems like a huge added cost to do these kinds of checks on every request
EDIT: I misread. This comment previously talked about 50 million lines being migrated. Instead, in a 50M LOC codebase, one specific codebase-wide migration was done.
Very impressive, but obviously not on the order of a whole-codebase migration
You are right, this is not a rewrite like the Bun case.
The real news is, at 50M LOC, it is able to handle and do _something_ coherent.
I can immagine Anthropic running this experiment multiple times and picking the most impressive one. Or I could immagine like this entire run costing like $1000+ of tokens for this particular run. Or maybe they tried a bunch of Pokemon games and it couldn't even finish some of them. Or is it just able to do this because it has an immense amount of FireRed training data, and if you were to give it an "original" Pokemon game, where it actually had to navigate novel circumstances it would fail.
And the only companies safe from this are the large corporations that shook hands with Anthropic? Because Fable doesn't seem to have actual safeguards, more like 'if you talk about this you will be talking to Opus.' It doesn't guard against offensive use, it prevents all use (offensive AND defensive).
Rationalists are inventing oligopolies from first principles, absolutely incredible things happening in SF
"We had to do extra work to make this safe because it's so advanced and dangerous..." how many times can they trot out that line before it loses its effect entirely?
[1] https://support.claude.com/en/articles/15425996-data-retenti...
This applies even with API usage through third-party inference providers (e.g. AWS' Bedrock and GCP's Vertex) or with a zero-day data retention agreement in place.
I understand the reasoning for doing this, but I don't love the precedent that it sets.
A customer could sign a ZDR agreement with Anthropic, and their API usage wouldn't be retained for even a day. That's no longer possible.
I used to get a response within 24 hours back in the Claude 1 days.
In January 2026, it took 2 weeks.
For my latest support inquiry, I've been waiting for over 8 weeks for a response. Eight!
That said, it can't handle legal/refund/complicated requests and just forwards to a human for those
the leap here is browser extensions appearing to block all mentions of ai across the web
and that's a good thing
> We will require 30-day retention for all traffic on Mythos-class models, on both first- and third-party surfaces. We won’t use this data to train new Claude models, or for any non-safety-related purpose, and we’ve instituted new privacy protections including logging all human access to the data and ensuring its deletion after 30 days in almost all cases (see this post for further details). The data will help us defend against complex and novel attacks (including new jailbreaks and attacks that operate across many requests) as well as help us identify and reduce false positives.
From Opus 4.6 there are no noticeable improvements for me in code generation. It works very well, till 90% completion, if you guide it correctly. And you need a little luck. For serious production code I need to understand what I’m doing so it helps a bit, sometimes.
Is it good or bad? 30 days is a long time for anything bad to happen
Very interesting. I am not sure this will comply with organizational policies and standards protocols (HIPPA etc.,)
Almost… basically they have unlimited power to decide what data is kept?
They obviously put their best model on the job to build that.
----------------------
Fable 5: Our most capable model yet Our newest model tackles your biggest challenges with fewer check-ins needed.
• <b>Included in your plan limits until Jun 22</b><br><br>Fable takes 2× the usage of Opus. • <b>Switch models when a message is flagged</b><br><br>When safety measures flag a message, automatically switch to a different model to keep chatting. When off, your chat will pause instead. <a href="https://support.claude.com/en/articles/15363606" target="_blank" rel="noopener noreferrer">Learn more</a>
> Fable 5's safety measures flagged this message for cybersecurity or biology topics. They may flag safe, normal content as well. These measures let us bring you Mythos-level capability in other areas sooner, and we're working to refine them. Switched to Opus 4.8. Send feedback with /feedback or learn more: https://support.claude.com/en/articles/15363606
Seems like GPU drivers are cyber weapons of math destruction now.
They kind of are, at least in the AI race.
> weapons of math destruction
lol. great, whether intentional or not.
The frontier labs now have every reason to hold back and sell only to their preferred trading partners. I don't really like the new arbiter-of-knowledge system we're barrelling toward.
[1] "This model has specific safety measures that flagged something in this message. This sometimes happens with safe, normal conversations. Send feedback or learn more."
Glad to hear the UK is finally making an effort to catch up on the AI front ;)
Probably tongue-in-cheek, but UK 18th, US joint 34th with Poland
In the UK you get thrown in prison for making a slightly unfriendly tweet. Freedom of speech simply does not exist.
No sane person sees that as being less authoritarian.
Do you? The closest thing I can think about is how someone was jailed for encouraging arson attacks on asylum hotels. I'd be extremely surprised if the US had zero cases of somebody receiving a police visit after threatening to kill the President or bomb a school or something...
(FWIW I do think the UK needs stronger free speech protections, but saying that you'll be immediately jailed for writing unfriendly tweets is a huge stretch)
EDIT: I’m long context I mean
I’m curious how this will feel to my code “butt dyno”. I haven’t noticed much between Opus and Sonnet. I’m comparing this difference to the early days of Claude in 2025. It does what I need and both need a little bit of correction and whatnot. Benchmarks are nice, but I want to see how this feels. Looking forward to trying it later tonight.
I think most software projects have reached the point that the speed of capturing real information about what the winner's circle looks like, and therefore what the program should be, so many magnitudes slower than the amount of code that can be generated in the wrong direction.
I'd need to measure these new models on well understood but complex problems that are relatively easy to validate to get a sense if they are 'better'; on the other hand, the real impact in daily life may be marginal since generating code is not the biggest problem at the moment.
At least they name their models honestly now to indicate that the religion has nothing to do with reality. Soon the disciples will pay the full token price to fatten their church leaders.
Anyway we already knew this was going to be expensive.
People are no longer commonly constrained by "model too dumb" limitations (in SOTA models). They're constrained by "model too expensive." So making the model ever so slightly smarter, while doubling the price, feels like a regression.
I actually think a Sonnet upgrade, while keeping the same price, would get more buzz. It addresses a wall a LOT of people, without unlimited budgets, are hitting (i.e. people feel forced to use Opus, which they cannot afford, because Sonnet's limitations).
OpenAI recently retired Codex-5.3; which was
very* negatively received. Not because Codex-5.3 is superior to GPT 5.5, but because it was half the usage-cost while being "good enough." They made a better SOTA, but didn't realize that some of those customers are playing with Deepseek 4 Pro now instead of GPT 5.4/5.5.> Finally, we’re making a change to the way we handle business customer data for Fable 5, Mythos 5, and future models with similar or higher capability levels. We will require 30-day retention for all traffic on Mythos-class models, on both first- and third-party surfaces. We won’t use this data to train new Claude models, or for any non-safety-related purpose, and we’ve instituted new privacy protections including logging all human access to the data and ensuring its deletion after 30 days in almost all cases (see this post for further details). The data will help us defend against complex and novel attacks (including new jailbreaks and attacks that operate across many requests) as well as help us identify and reduce false positives.
This sounds suspiciously like a capacity story masquerading as a safety story.
While I appreciate being conservative, ~5% at the scale Anthropic is operating at is too massive a number. Speaking from my own experience, the actual number is higher than that as well (working on pretty benign tasks such as porting an old open source game into a different language). Opus 4.8 itself even identifies the gaurd's false-positives when its sub-agents are being blocked.
1. Mythos and Fable share the same underlying model weights. Fable has active classifiers that block high-risk biology and cybersecurity tasks. When Fable 5 detects a restricted task, it automatically falls back to Claude Opus 4.8.
2. Evaluation awareness: In white-box testing, the model sometimes alters its behavior to satisfy a suspected "grader," formatting reward-hacking as "good engineering practice" to avoid detection.
3. Shows a higher rate of hallucination than Opus 4.8 (although opus 4.8 card had mentioned an 'honesty upgrade')
4. Interestingly, it scored (56.31%) lower than Gemini 3.5 flash (57.86%) on Finance Agent bench
There are some interesting notes on test time compute but I couldn't think of a way to summarize them
I wonder how much of the time people will just get Opus 4.8 at 2× the cost.
Isn't (less than) 5% of sessions a lot? I was expecting a sub1% guarantee there, so this surprised me already.
Huh? We've seen nothing but wall to wall predictions that these models are going to take all of our jobs and kill us.
What's the value add here?
Hello,
We're writing to inform you about some updates to our Privacy Policy.
These changes only affect consumer accounts (Claude Free, Pro, and Max plans). If you use Claude Team, Claude Enterprise, the Claude Platform, or other services under our Commercial Terms or other agreements, then these changes don't apply to you. What's changing?
Claude can do more than ever — taking on bigger tasks and connecting with the apps you use. We've updated our Privacy Policy to be clearer about the data we collect and how we use it. We encourage you to read the updated Privacy Policy in full, but we’ve set out a summary of the key changes below:
1. Multi-step tasks and connected apps. As Claude takes on more multi-step tasks and works with third-party apps and services, we've explained the data this involves — including how data can flow to and from third parties when you connect a service or have Claude do tasks on your behalf.
2. Verification data. As part of our measures to keep our services safe and secure we may ask you to verify your age or identity, and we've described what we collect and how.
3. Study participation. If you take part in Anthropic studies, surveys, or interviews, we've explained the information we collect.
4. Additional information about our data practices. We’ve provided more detail about how we communicate with you and promote our services, including providing tailored recommendations about our services that may be of interest to you. We've also clarified the circumstances under which we may receive or provide data to third parties, and the legal bases we rely on when processing your data.
While our products have evolved, our commitments haven't: We don’t sell your data, Claude remains ad-free, and you can control whether your chats and coding sessions are used to train and improve Anthropic’s AI models. Learn more
For detailed information about these changes:
Review the updated Privacy Policy
Visit our Privacy Center for more information about our practices
- The Anthropic TeamI'll be disappointed when 4.6 is retired.
biology? what the heck?
Mythos 5 Fable 5 MythosPrev Opus 4.8 GPT-5.5 Gemini 3.1 Pro
SWE-bench Pro 80.3 80 77.8 69.2 58.6 54.2
SWE-bench Ver 95.5 95 93.9 88.6 - 80.6
Terminal-Bench 88.0 84.3 - 82.7 83.4 -
BrowseComp (Single-Agent) 88.0 - 87.9 84.3 84.4 85.9
BrowseComp (Multi-Agent) 93.3 - - 88.5 - -
HLE (No tools) 59.0 - 56.8 49.8 41.4 44.4
HLE (Tools) 64.5 - 64.7 57.9 52.2 51.4
CharXiv Reasoning (No tools) 88.9 - 86.2 80.5 - -
CharXiv Reasoning (Tools) 93.5 - 92.5 89.9 - -
BioMystery Bench (Human) 83.9 - 82.6 80.4 - -
BioMystery Bench (Hard) 46.1 - 29.6 40.0 - -
OSWorld-Verified 85.0 85.0 85.4 83.4 78.7 76.2*
CritPt 28.6 - 20.9 27.1 17.7 -
ArxivMath 78.5 68.7 71.8 71.5 64.0 -
[0] https://news.ycombinator.com/item?id=48312633Edit. It just refused an investing question too. Not sure what’s going on.
Genius way to double the price on Opus 4.8!
I wonder how much butterfly habitat has been/is being replaced with data centers?
BTW for another discount opportunity, if you reload usage credits on a claude.ai plan at $1000 increments then you get a 30% discount compared to paying API.
"Claude Fable 5: a Mythos-class model"
"we're also launching Claude Mythos 5"
what is the 5? how is mythos both a model category and a model name?
Who is refactoring by hand? This comparison is not relevant in 2026.
For the token cost of explaining some task to Fable, deepseek v4 pro is able to solve the same task many times over.
Anyway, anecdotally, I find Copilot shockingly awful. It makes random changes to files that have nothing to do with the problem. Call it out, and it makes other changes to other irrelevant files.
ChatGPT and Gemini are both much better. Grok also isn't bad. Claude, I honestly haven't tried yet on these issues. Perhaps I should...
• My most noticeable immediate jump was in how its frontend design was much more intentionally crafted, and delightful without feeling like 'AI vibe coded'; with better end-user usability too.
• In some internal agentic harnesses, it achieved better results with about half the tokens, making it cost the ~same as Opus 4.8 price-wise! The real price increase is less than 2x; with biggest differences in harder problems where Opus 4.8 struggles (or needs many turns).
• Part of the token efficiency improvements come from Fable doing more targeted and surgical diffs, with less non-necessary changes. This is great, because PRs often have less LoC changes for review. It writes more maintainable code without explicit human steering.
• For general conversation and assistant style use cases, didn’t really notice a difference vs 4.8.
• 1M context window, without increased pricing for long context is AWESOME. This is a massive win.
• The classifiers are super aggressive and sensitive and this does happen for very benign, non-security coding tasks. Fallbacks to 4.8 worked like a charm; but the filters are definitely super sensitive.
Overall, I would describe this as a step change and worthy of the "Claude 5" model name. It did take some time to understand the intelligence ceiling of this model; and even with an extended testing window I'm still discovering new things and often surprised (in a good way) by the model.
Not to cast too much criticism. HN is extremely well-moderated (thanks team!). But think we-developers need to be very wary.
Either way, I agree that HN is quickly becoming more manipulated and low SNR, like the rest of the entire internet.
Literally have not used Claude Code at all today. I asked it to review the uncommitted code and in <8 minutes it used up my usage ($100/mo plan) and it doesn't reset for "4 hr 36 min". WTF. Oh, and it burned through $20 of extra usage before I could catch it and kill claude code (so I don't even get the output of all that work since it was still churning).
Double the cost my ass, I use Opus heavily and it's never like this. I haven't hit a limit on the $100 more than once and that was under heavy load.
> virtualization
switching to opus 4.8
ok fair > embedded-allocator
switching to opus 4.8
urgh fine > chrome
switching to opus 4.8
are you kidding me?This seems pretty bullshit, you're paying through the nose for tokens and if you are doing anything ML-adjacent, you might silently get worse output without knowing it.
What's the point of being in the cyber verification program at this point? It looks like I cannot use Fable 5 for vulnerability research.
Do they expect us to use this as a toy? Releasing a new more powerful model but not allowing normal use cases because the word "secure" showed up is a Dilbert comic, not a viable product.
Obviously there are plenty of innocuous applications too, but it's not like the people building decompilers for nefarious reasons will be explicit about it. The LLM abstraction just inherently doesn't have enough context to distinguish your intentions or your broader use cases. This is why both Anthropic and OpenAI have had to create side channel mechanisms for security researchers to establish a trusted use context. It sounds like this makes this not a viable product for you, unfortunately, and it makes sense that that's frustrating. But I also don't see what different behavior one could reasonably expect given the constraints.
If it's any consolation, these restrictions only make sense for models that are ahead of the open-weights frontier, so open-source hackers will presumably get Mythos-level capabilities in the relatively near future anyway.
Release your best model, let the world adapt and evolve, and let's move to the next thing.
Every wrong direction/mistake is more expensive and takes more time to fix. When you have small loops you can catch those mistakes faster and cheaper.
To me we are very far off from economically given long-running tasks to agents.
/model claude-fable-5
Or start claude code with:
claude --model claude-fable-5
[0] https://support.claude.com/en/articles/15363606-why-claude-s...
Haiku = essentially phased out Sonnet = the Haiku use cases Opus = the new Sonnet class Fable = the new Opus class
If I am right, the other "5.0" models will be conspicuously absent, possibly even for a couple of months. (If Opus 5 follows soon and is even modestly better than 4.8 then I was wrong.)
This is why Claude Code just doesn't make sense to me. I need an agent that can plan using Opus and execute using DeepSeek or something else.
[0] https://support.claude.com/en/articles/15363606-why-claude-s...
Fable 5 looks compelling. Fable, I like the word too. Anthropic definitely knows marketing.
How was it measured? How was the output of this magnitude verified over a period of couple of days?
Fable 5 said the first screen shot is from “ IDA Pro’s Hex-Rays decompiler” and a windows driver. The second screenshot triggered the safety guard rails and pushed me into Haiku.
Apparently the code is Windows driver code.
It appears it can be tripped by things as simple as a mention of equilibrium, or anything involving something that looks like chemical kinetics, even at an abstract level. Even touching basic open source packages in my field will trigger it.
Wen UBI
The fable part appears to be that it's affordable by mere mortals. Anthropic support told me "too bad" when I requested a refund.
On GitHub Copilot for Business, Claude Fable 5 is only available if you are willing to let Anthropic retain your data. That in conjunction with the model being removed from plans in a couple of weeks leads me to believe that Anthropic is between training runs and using this as an opportunity to grab way more training data...
How in blazes to you end up with a 50M line Ruby codebase? WTF?
am i missing something?
why would I pay 200 out of pocket and then some for the best model, it seems very silly.
Opus 4.8 gets stuck in weird loops where Codex one shots the bugs.
Imagine if Google would tell you "we can't let you search that as you may use it for harm".
Also 2x the usage of Claude? Your limits are already ridiculously low.
...don't like the sound of that.
Why oh why are we insisting on dragging these violent legacy states into the AI age? Let alone using them as a trust vector for when to (and not to) remove safeguards?
This seems like a way to get somebody nuked.
>"We’ve therefore launched the model with safeguards that mean queries on some topics will instead receive a response from our next-most-capable model, Claude Opus 4.8"
That's a very surprising solution. Imagine being asked to do something you feel you shouldn't do, and rather than refusing, you say, "Yeah I could do that but given that I don't want you to succeed at this task, I'm going to hand this one off to my slightly less capable colleague, on the assumption that they won't actually succeed. Of course you'll still be charged for all the tokens used."
It's a very interesting choice. I think I understand the business logic correctly, but it's still surprising.
> Are there any wild populations of Tetanus that lack the dangerous plasmid?
useless
Upd: I meant big picture, not with respect to this model release. Where do subscriptions figure into their strategic vision. Will consumers end up paying enterprise prices in the future?
why do they have capacity now that they wont in a few weeks?
I just use dumb and fast models now. I'm more engaged. I think that the higher the quality of the model, the more you tend to vibe with it, and then the more hallucinations you then miss. I'm not sure which is more productive, but I definitely burn out faster the more I vibe. At some point you're spending your time on forums, discord, or youtube instead of engaged with what you're building. Or you yak shave about your tooling and end up creating the 600th multi-agent gastown harness and blowing thousands of dollars on tokens to create it only to discover it's too expense to actually use.
(I’m highly confident open models will eventually achieve a similar performance benchmark with distillation over time)
AI Savings Misses 'Should Be Making Executives Uncomfortable,' Bain Says - https://news.ycombinator.com/item?id=48359010 - June 2026 (0 comments)
AI sticker shock hits corporate America- https://news.ycombinator.com/item?id=48307098 - May 2026 (146 comments)
ZIRP (zero interest rate policy) is over, software engineers no longer call the shots now that there isn’t vast amounts of capital chasing yield, and that capital bidding up salaries and keeping the labor market for engineers tight.
If you are x more productive with generative AI, very shortly you are going to have to prove it with a token budget (or, if you’re lucky, an org willing to spend for on prem hardware for capped token cost, fixed capex vs uncapped opex).
The comparison is not SWE vs SWE with AI. It is SWE vs SWE with AI with a constrained token budget ($x/month) delivering the same value at the same or lower cost. If you cannot prove that you are wildly (vs marginally) more productive with the AI, why would they pay for it? Prove it.
Specifically they need businesses that fired people and adapted their business to the products, so when the unsubsidized costs hit the businesses are forced to eat the true costs.
Yes they can't afford to give the products for free, but what is essentially happening with AI services is economic dumping, keep costs artificially low to get people to fire everybody, and then Jack the rates once they have Monopoly control
I agree. They need addicts, but they are high on their own supply and everyone else can see the danger in getting hooked.
Why wouldn't Anthropic just wait until people start subscribing, do some kind of marketing push, or obtain some kind of other sustainable revenue stream, before they go IPO? I wonder if they see the writing on the wall with all of this and want to cash out as quickly as possible?
Of course, they are a casino as well giving you free spins at the wheel with their new Fable machine, and it is done on purpose.
Once there freebies have expired, many of its users will begin to gamble more on the new casino machine and will realize that it is expensive.
The ramifications go beyond the individual which is why I assume they mentioned it. They don’t need to use it/not use it for it to have interesting implications.
Is it nice we get the trial? Sure. Is it also a common play in the playbook of tech companies? Yes.
They'll probably tighten the quotas to reign in whales though.
Realistically I think Anthropic just has insane demand but finite capacity to run models, and Fable will just make them more money if they dedicate it to API pricing. I suspect the goal here is something like: get individual engineers/PMs on their personal plans to taste Fable and then go to their meetings and say "Yes doubling the price of every single input/output token is a good idea, boss".
The AI landscape is changing rapidly, and with Apple announcing the option to change the AI backend, and potential requirements enable AI choices as well, similar to EU browser choice requirements (this is more reading tea leaves than any actual requirements I am aware of). The new OS changes coming to support Googlebook, and deep Copilot/AI integration into Windows will make maintaining user facing subscriptions essential for independent model developers like OpenAI, Anthropic, and Mistal to remain relevant longer term.
If the don't maintain that relevance there is increasing likelihood that they will get consumed by other companies whether it's Apple, Microsoft or Google to form a foundation for their OS, or other cloud providers.
It's kind of annoying not getting access to the primo model and paying 200 bucks a month. I understand 200 bucks a month is basically nothing though.
Like I don't totally understand why they'd let me have it for a couple weeks and then take it away and say I can have it but I have to pay retail and retail is like $1,000 a day.
It's better to have loved and lost than to have never loved at all??
As a consumer I can choose to buy subscriptions to a range of things, including $5 droplets or VMs on a broad range of cloud hosting providers. I can even buy cheap bare metal at a bunch of providers at an affordable retail rate.
I can also buy "unlimited" AI packages that will be optimized to fit the cost model from a variety of services, with different impacts, such as rolling outages when I consume a daily or hourly allotment.
Right now VC and the investor class are subsidizing the rapid evolution of the services and availability, but that VC is running out. In more traditional economies, AI would have developed and rolled out more slowly, and through metered subscriptions, with the eventual rolling out of "unlimited" packages like telephone, internet, or cell services once the market became commoditized.
We have seen a big inversion of that with the race to "win" AI marketshare. Now the true cost is being exposed, and the most competitive and capable models are hideously expensive to operate, so it makes sense that we are moving to metered billing for a utility service. If you want gas, you can buy regular or premium. If you have a premium car you definitely want the premium, but for most people regular is good.
Give it a couple of years, and the survivors will settle around fairly industry standard models of consumer grade services, pro-sumer accounts, and business/enterprise models.
Things are still shaking out, but I get the sadness. Luckily I work at a big tech company who is banging the drum on doing experimentation so I use my prosumer claude pro and other accounts at home for hobby stuff, and save my heavy lifting and potentially experimentation for work :P
Going PAYG only will effectively take these tools away from a huge amount of people and accelerate the push for local LLMs.
OTOH, accelerating the push for local LLMs would also be fine with me.
Assuming this isn't just a supply issue on their side, nothing says "ethical AI" like only allowing mega corporations to use it through cost barriers.
Talk about a strawman!
How many government sanctioned school bombings does it take for them to quit working with said government? For now we know that number is somewhere between infinity and 1.
The question of collaboration with USG is a much more complex one, but is not the one raised above.
Edit: I'll also add that I doubt any AI-doom people "trust" Anthropic per se. The entire angle of questioning – again – misunderstands the AI-doom argument. You appear to think that if companies behave unethically, they cannot be trusted and they will not produce good outcomes, inversely: if they behave ethically, they can be trusted, and they will produce good outcomes.
Any competent AI-doomer would argue that ethics or trust are essentially irrelevant.
The entire problem is that people can act totally reasonably, even ethically, and this is not a guarantee of good outcomes. Situations can be created in which completely ethical, reasonable behavior actually produces a bad outcome. You do not need to assume people are bad in order to produce a bad outcome, and inversely you cannot assume that you will get a good outcome from good people.
"Arms races" are one class of situations that often have this characteristic. "Bureaucracy" is another class that we encounter a lot in daily life. There's a lot of them!
Do we know this? I’ve seen evidence they lose money on heavy users. But so do gyms.
Most gyms sell more subscriptions than they can fit under their roof at one time. If a gym only sells to heavy users, it will either be constantly turning members away or have to buy more equipment. Its equipment will wear off faster. Depending on amenities, it will go through towels, soap, water, et cetera faster, too.
Unless they're really, seriously wasteful with the soap.. there's no chance a gym is losing money on a heavy user
Right now all these AI subscriptions are priced like Planet Fitness, but they're used like Equinox. They're hoping that the new a la carte offerings will move their pricing more in that direction as well.
Where?
What I wonder however is if these tools will become something I use at work only. $100/month is already a massive stretch budget wise. If these models keep devouring tokens there’s no way I’d get the same usage time out of them for $100 in usage credits.
I just don’t think I’d use them much at all at home.
Both. They are charging the most they can get away with and that amount is still heavily subsidized by VC capital.
There are huge numbers of users (myself included) that do have an exact idea of what inference costs on open models. Because we can buy tokens from 3rd parties that have no motivation to subsidize our use. That's to say, there's a fair marketplace[1] and we're hanging out there.
If you want to say "I don't think anyone has a firm grasp on actual inference costs on these proprietary/closed models", then I could agree with that.
We know roughly how much these companies spend and what their revenues are. Based on that, they'd have to more than double revenue (without spending more money) just to stay even, and that's not good enough given how deep in the hole they are.
> OpenAI and Anthropic are heavily subsidizing their inference -- no wait, they are charging the most they can get away with before going public. Where is the truth?
Both are true. I mean, I'd be willing to spend a bit more than I do now, but not more than double, and neither are most companies. The company I work for is currently investigating how to reduce LLM spend, not looking to spend more.
Now that 200USD subscription starts to feel cheap...
It's worth it, and I can afford it, but I am not really the right type of user for token-based usage. It's all for personal and free work.
Unfortunately, that doesn't work within a single session. The K-V cache of a model is intertwined with the model's configuration. Switching models invalidates the cache, meaning everything up to the point of the switchover is processed like a new, uncached input token.
Per Anthropic's pricing doc, an Opus 4.8 cache hit costs 50¢/MTok, while Haiku costs $1/MTok for uncached input.
Model selection works best if sessions are short and self-contained, particularly if the first few interactions can reliably classify the model need. That probably covers most 'support chatbot' use-cases, but it doesn't describe the kinds of heavy agentic automation that really chews through token budgets.
I think the end game is routed model usage and SLMs. I think Apple is going to prove this in the consumer space pretty handily and I'm curious how the Android ecosystem responds since the hardware is considerably lacking in model performance. I think Apple has a huge opportunity here, as much as I don't like their current ecosystem of walled garden. They did position themselves very well with ARM and custom chips for their hardware. Hopefully the broader ecosystem of ARM and Linux are able to make some headway and we see a more formalized, and broadly accepted, architecture to capitalize on.
Anthropic wanting to switch billing to API rates is them just wanting to generate more profit.
Even if subscriptions are locally profitable (i. e., the cost of the subscription covers the cost of inference), they're still subsidized because they don't cover training and running the company; otherwise, these companies would be profitable.
Though the day is coming when there’s no distinguishing, I’m sure.
I'm doing basic web development here utilizing animejs. Nothing too complicated (mostly saving time doing the scaffolding, still write the bulk of animations manually).
Truly believe that American companies are going to get completely curb stomped by China due to greed, ineptitude, and violating the social contract.
Deepseek V4 Flash is suprisingly capable and insanely cheap. It takes so much to get the session cost to get to $0.01.
I agree with you on pricing, but what do you mean by this?
Why aren't corporations doing more to help workers with childcare? Why aren't they doing more profit sharing with workers? Why aren't they encouraging unions or sectorial bargaining? Why isn't the government mandating any of this?
Americans very rarely benefit when US corporations do well. That needs to change. No one benefits if Meta continues making billions in profit every quarter while society suffers from isolation, depression, suicide, and scams from their services. Americans don't benefit if health insurance companies are making massive profits while they can't afford deductibles.
Our society has been setup to simply extract wealth in all facets of life. That's a sick society and it needs to change.
I'm not saying China does this better, in fact China has some of the worse worker rights out of all the industrialized countries; but at least American consumers would benefit from cheaper higher quality Chinese goods. The world would likely benefit too if America got off the cold war hype train that did nothing to benefit humanity outside of those making weapon systems.
But Claude models seem to be better at long term problems or more ambiguous problems.
I'm curious as to what the primary benefit here. Are there secret improvements in training? There hasn't been much in fundamental model architecture, I don't think. What about harnesses? I wonder what's pushing the AI. It seems like harnesses is the main thing pushing AI ever since CoT.
I can recognize so much of the GPT/Codex generated code long after it gets merged (not by me).
Additionally, the time spent on every agent turn on GPT 5.5 is much longer compared to Claude Opus 4.8, which means iterating on the code takes a lot more patience, and there's a lot more nitpicks to pick when actually using GPT 5.5 to do software engineering.
Feels like GPT-style models are more geared on doing one-shot software vibing (and handling the vibe coded mixture) compared to Claude's focus on actual software maintenance. I got a GPT Pro sub for free and wanted to cancel my Claude subscription so much, but I still keep reaching Claude models a lot more. Frustrating.
The step-up in intelligence looks massive (we'll see in practice), but the price is getting to a point where it's making me question if it's even worth giving it a try.
Good competitors will probably be out soon, which should level the playing field. I am more excited about that, just the fact that they showed that such an improvement is possible. I'm okay waiting a bit longer for this to become attainable for plebs like me.
Kind of like billing a programmer by the hour.
Probably all about the IPO.
They also, FWIW, say that they've instituted new policies on their end such as logging any human access to the stored data and automated deletion after 30 days in "most" cases (with another link to a document detailing that further).
Sounds like "bait and wait".
If you think about it, the more people pay for these new and more resource hungry models, the longer it takes for them to become no extra cost and the longer it takes the more people are tempted to pay extra.
Pay-as-you go isn't a common thing in SaaS. For example, except for AWS SES, all email providers are bulk-subscription based.
I am on the $100 Max plan.
It's the same exact speed as opus >=4.5, sonnet 4.5, and twice the speed of opus <=4.1
It must have about the same active parameters, or else its a larger model running in turbo mode (smaller batches) and being heavily subsidized for some reason. But given most of the benchmarks are within 5% I doubt it is a much larger model. Most perplexing.
If you rely on this as a core part of your business/profession, you will be at their mercy and subject to whatever whims or challenges they have.
> Fable 5 · Most capable for your hardest and longest-running tasks · Uses your limits ~2× faster than Opus
Input Price $10/M tokens
Output Price $50/M tokens
Cache Read $1/M tokens
Cache Write $12.50/M tokens
2x Claude Opus 4.8, same as Claude Opus 4.8 (Fast)
Frankly, not even Opus 4.8 would be enough of an incentive to use at that price range (enterprise-wise; would not even bat an eye as a consumer)
I think it's safe to assume everything AI related is heavily biased until proven otherwise. Just like in pharma.
they aren't married to a particular lab, most of their usage is their in house model i believe
prior bms relied mostly on unit tests or synthetic judges which are easily benchmaxxed, which leads to nobody trusting benchmarks
we need people manually checking the data for good code quality
this benchmark looks very good from the methodology. a cog researcher checking the data themselves is very high signal (not scaleable so don't take the benchmark as gospel, but directionally good)
TL;DR - they worked with OSS project maintainers to build tasks. They score models based on whether a PR is mergeable. All tasks are graded by a human researcher. SoTA models have hill-climbing to do which raises the bar and inspires confidence. I'd say it's legit.
whats the logic in claiming its a borked metric when everything listed is an anthropic model.
1. That estimate could easily be wrong.
2. That estimate is, of course, usable in RL training. This isn't an inherently bad thing, and this is more or less what has improved coding models so much lately. But it does mean that other companies could and surely will do this sort of training, and Anthropic probably did too.
3. OSS maintainers are far from perfect, and there's an unfortunate uncanny valley-like effect in which a coding model can produce code that is just convincing enough to pass review even though it's actually totally wrong. I don't know whether this is a specific issue here.
> you still see improvements
This is expected if they are training their models on it, right?
> objectively-bad results
Keen to learn when this has been the case, i.e. across version increments in major models.
I've been enjoying seeing how the quality of individual models differ based on the amount of reasoning effort you give them. If they were baking an a good pelican you wouldn't expect them to differ so much.
(Google Gemini are the only lab that have very clearly paid attention to the quality of SVG animals-riding-vehicles, see their announcement for Gemini 3.1: https://twitter.com/JeffDean/status/2024525132266688757 )
Only coherent move at this point: hit the minus button immediately. There's never anything about the model in the thread other than simon's post.
As much as people on HN like to dunk on Gemini, I’ve always found it to be pretty good at understand a code base more than Claude.
In a way I relish the opportunity to just make do with cheap Chinese models, massage my prompts, and go back to coding by hand. If this is how it's going to be, screw 'em.
I don't make money on the code I am writing right now. I really don't like where this trend might go.
there are many standardized evals to do this correctly and Anthropic ignored them to provide a 18 second sped up video of a 50 hour run?
yeah I don't trust this until they provide a live run by a 3rd party with full reasoning traces in real-time. The reason we all liked the Gemini Plays Pokemon style runs were because they were live and couldn't be faked
Fast forward to today and GPT-3 has laughable performance.
Lawyers, doctors, students, teachers. Lots of people using GPT models carelessly in harmful ways.
I am sure that they can develop their own equivlient version of such clusters in around 1 year though. Distilling fabel 5 will also go a long way.
edit: I am not really sure if it works like that. I haven't looked too deep into deepseek v4 pro specifically.
Pandora box is open anyway. It's better now for everyone to have the same power rather than a few national states.
On your other point, the government still has systemic leverage and can compel access, so this doesn't remove that risk.
That doesn't mean this is the end of the world, and some balance of power is usually good. But I do think it will still increase the capabilties of rogue actors and their net harm.
One was a piece of code I gave it to improve, it did so and then started writing tests, some of which tested security so the safeguards triggered
Another was one of the cryptography puzzles I use as new model tests, which are hard to oneshot and there's no public solution anywhere, it completely refused to even try to solve it
(I had same issue, just asked it to check some code that 4.8 had modified earlier in day)
In fact, I did go back to DeepSeek V4 Flash for most of my problems as it is way cheaper and there is no need to use SOTA for absolutely everything.
Not quite. They will definitely have "no criticism of China/communism" safeguards.
Thus Asian labs will have to generate their own data sets, which with the huuuuge usage boom from deepseek, mimo, kimi, etc, they will be able to.
Based.
Even OpenAI and Google are struggling to get this kind of performance. If the distillation defenses are any good + chip controls prevent China from training massive models, it's over.
Its obvious Anthropic used it to hype things up and that’s about it.
Evals come from a million places and new evals and robust perturbations of existing evals abound. They test a variety of tasks in a variety of ways. All of them individually are flawed. Taken together the aggregate signal is highly useful as you more or less marginalize over a lot of different things. Not to mention these companies have plenty of proprietary internal measurements, they build benchmarks themselves to probe their models and then also have flywheel traffic and A/B tests.
You are right to call out benchmarks but to dismiss them or not take them seriously is a mistake.
Maybe back when this was a scientific endeavor; not now when enormous, enormous amounts of capital are on the line. Along with an entire cult's chosen eschatology.
That's where all the regressions and inconsistency in experiences stem from: RL can still only go so far vs having more parameters
- It talks a LOT more like GPT models. You know: wrinkle, shape, gate, coarse, scope, gap, path, production-ready-workflow-of-the-day, and so on -- "that's expected, a consequence of the previous like-driven workflow". If I wanted to get a headache using AI I would have gone with GPT in the first place!
- It outputs text in a much harder way to follow along. I can't exactly say what it is. Maybe a bit of everything? Bolds are missing, bullet points are gone, paragraphs are bland and too long, and it doesn't feel like a model programming with me, but rather a somewhat full of themselves grandpa developer looking down on me. It's very weird to describe this, but it is definitely how I feel.
Granted this can totally be because of the way it reacts to the prompts now. We've got a rather large corpus of skills and "rules and good practices" that Opus 4.6 responded to great, and maybe the new models just get turned into this when fed with them....I don't know.
Either way, with Opus 4.6 being as good as it is, I need Fable to be a significant step up to justify a price increase. if it can get me to babysit opus a little bit less on some stuff, it might be worth it. Otherwise, I'm very happy with Opus 4.6 and hope they don't deprecate it.
ECI (good aggregate measure using IRT): https://epoch.ai/eci?view=graph&tab=release-date&subset-view...
METR time horizon (now topped out): https://metr.org/time-horizons/
Oops, time to reauthenticate for the 10th time!
I still remember Sam Altman “begging AI to be regulated” and AGI being “some thousand days away”.
Breed faster horses and hope one will birth a locomotive.
This is just good business sense. In what scenario would you ever make the names dumb and forgettable?
> Boris Cherny coming to HN “Hi! it’s Boris from the Claude Code team” to get real tech people’s goodwill.
This is good customer support, lol. From what I can tell, it is indeed Boris Cherny responding, not outsourced to AI or other staff. You're really getting a response from Boris. I suppose that is PR, but it's not unjustified PR, it's accurate.
I'm not even a crazy AI fan, but your criticisms are ridiculous here. It reminds me of the quote from Knives Out -- "Your Honor, she endeared herself to him through hard work and good humor."
Clearly you've never bought a TV or headphones!
It's getting to a point that it's offputting, and the next step would be to put it into "untrusted" bucket. Opus 4.7 already burned their credibility once, 2 more strikes remain.
While everyone else is wasting time and money on the slower, more expensive models, you've found a way to outpace everyone for less money. Everyone else is wrong and you will get rich.
(I don't actually believe the premise is true, I'm just pointing out the logical conclusion to what you're saying so maybe we can reconsider the premise)
This is a good thing. I wish every company would do this. I subscribed to Proton Mail after interacting with someone from their team here on HN.
Lol anti-AI bias on HN is crazy. Simply giving your product a quirky name is now being considered manipulative advertising. Is just doing normal PR and marketing something AI companies aren't allowed to do?
They're originally named after the blends at a nearby coffee shop.
https://postscript.co/pages/brew-guide
I've noticed nobody at HN knows what "marketing" is or how to do it. It's not just naming things and being evil and cynical is not the most successful method.
…also frontier models are a superhuman life changing experience. If they aren't, what possibly could be?
I've been working with gpt 5.5 and opus 4.8 quite a lot, and interacting with Fable feels like a smart guy just entered the room.
Also, I dont think Boris C. is coming here for PR. He is a tech guy, and this is the best place for tech discussions. Why so cynical? The guy is an engineer.
[0] https://cap.csail.mit.edu/death-moores-law-what-it-means-and...
Are they really making 12,000 arrests a year over tweets and posts?
Your comment earlier.
Edit: also, not much change in the last 10 years in prison population. https://commonslibrary.parliament.uk/research-briefings/sn04...
12k people a year thrown in prison for spicy tweets
"Spicy tweets" including:
sending false communications
sending threatening communications
sending or showing flashing images electronically to people with epilepsy intending to cause them harm (‘epilepsy trolling’)
encouraging or assisting serious self-harm
sending a photograph or film of a person’s genitals (‘cyberflashing’)
sharing or threatening to share intimate photographs or film
We decided that we aren't one of those authoritarian countries.
Haha, it's literally the first sentence of the Wikipedia page. That's fucking funny. Try again.
> Please don't post comments saying that HN is turning into Reddit. It's a semi-noob illusion, as old as the hills.
I don't agree with that statement universally, but I have to say I do when it comes to this article. I came here hoping for substantive discussion from those who'd had a chance to try it out; instead what I got was a seemingly endless stream of venting. There's a place for venting - and plenty to vent about with the state of AI nowadays - but to borrow from the HN guidelines you linked, it does very little to gratify my personal intellectual curiosity.
Anthropic needs to be at least somewhat in the good graces of a capricious administration that is already under pressure from businesses and citizens to regulate AI companies across multiple different domains, whether it's energy consumption, job displacement, military and defense applications, surveillance, etc.
If Anthropic wants to survive, they need to acquire influence with the government that most impacts them as an American company, and a massive exporter of services in the AI space to other countries, otherwise they could get locked down and locked out of the market for national security reasons.
It sucks, but sometimes the survival choice is to make an ethical compromise in hopes that you can still be around to make better decisions later.
This "simple" fact needs quite a bit of additional context and work. Making grandiose ethical claims like this can be countered with other grandiose claims such as the fact that there is no ethical existence under communism or socialism.
The fact that there is no ethical consumption under capitalism is not material to whether or not ethical existence is possible under communism or socialism. In order to survive in a capitalist society, one inherently has to make choices that require trade-offs, and those trade-offs are burdened by a history of decisions made not just by the people alive today, but our ancestors as well. Does that mean I walk around chanting "Reparations", "Land-back", or other calls to action? No, but I do acknowledge that there are unresolved issues and as a Canadian, I know we need to do more to resolve treaty issues, and environmental issues, and system discrimination. I also know that Americans need to do better to address systemic discrimination and many, many other issues. It also doesn't mean I want to give back my house, or give away all of my possessions. It just means I try to make good choices and support businesses and people that are open about the trade-offs they make and try to engage as ethically as possible.
Acknowledging those facts doesn't absolve us of responsibility, it's a framework that allows folks concerned about whether or not they are doing the right thing to accept the trade-offs that they choose to make and be responsible and accountable for those choices to themselves or their communities.
We live in a world with scarce resources. It's possible that with a foundational redesign of the global economy, and the requisite authoritarian government that would be required to force such a redesign, we could eliminate food scarcity, solve energy scarcity, and make sure that everyone has a place to live. Those trade-offs are probably not worth the ethical cost in political and physical violence required to accomplish it. We have seen the trade-offs that happen when the powerful are able to exploit communist or socialist governments. We are seeing the "late stage capitalism" impacts of allowing the powerful to exploit capitalism in democratic societies. Acknowledging that the current capitalist system has lead to the greatest prosperity for the upper echelon (financially) of humanity, and a dramatic reduction in global poverty shouldn't obscure the reality that much of that wealth comes from exploitation of people and the environment.
It's a huge problem to unwind, and we can't let the burden of every choice that we make stop us from trying to do better, but we (as in society in general) can't do better if we don't at least acknowledge the compromises we are making along the way, and try to plan to fix it in the future.
Probably a topic better suited to beer and a pub setting than HN though :P
The AI companies sure are a brilliant example of corporations needing to do more to help their employees pay for childcare.
bitpush•1h ago
Philpax•1h ago