Should be fun.
Edit: clarification
Anthropic's actions seem performative. Others have already speculated on the likely audience(s).
Whether if it is true or not, this is part of their effort into using them as an example to scare everyone into getting congress to ban powerful models from being accessed outside of the US and also banning powerful local models from being released.
Anthropic does not care about you, and they are not your friends.
If it was just "that easy" then I doubt only "Chinese models" would be doing it and we'd already be packed with competition.
Distilling might be a thing but it isn't a free win.
That's not the point. Why is it a country thing? There are plenty of non-China startups in this space having resources at that scale. The "China" has resources is some "Western media narrative" speak. So Meta should have won a long time ago? Or xAI?
> culture (Asians are generally collectively-inclined, so sharing is in their core)
Just stereotype it? So we've gone from China -> "Asian"? Then where is your Korean or Japanese model etc? And somehow you know they're sharing.
> political bent (there will be no diplomatic repercussions) to put up a fight
More inferring from "Western media news"?
Where's the reality?
The media hyped up Gemini / Google TPU free-win last year. How did that go?
The latter is basically fine-tuning the model with direction from another model. Thousands of businesses do this every day to fine-tune. This is almost certainly what the Chinese labs are doing, since it has a much better effect on the end result than just getting simple answers to simple questions.
These complaints of distillation are inflating the problem to make it sound worse than it is, because they want the USG to block/ban Chinese model providers as protectionism. They have already called for more export controls on chips (which is funny because DeepSeek v4 was designed to run on Huawei chips and now the other Chinese providers are following suit). But they can't come right out and say that, so their claim is that they're asking for more export controls because distilled models might not be as safe as their own. But if you show them a jailbreak of their model that bypasses their safety, they'll tell you that any model can eventually be jailbroken so don't worry about safety.
- Entitled jerk that initially wronged people
"you're trying to rip off what I've already ripped off!"
Crawl the whole Internet to build a gargantuan sized LLM and then complain you're being copied...
"Well, Steve, I think there's more than one way of looking at it. I think it's more like we both had this rich neighbor named Xerox and I broke into his house to steal the TV set and found out that you had already stolen it."
LOL!
Get a grip, son.
Eventually these Chinese companies will release some extension like Honey, which will sit on top real, non-Chinese clients and send everything to China anyway.
It's over.
But with this, I don't have an issue. There is no theft since what is being used is the exact product that is being delivered. Yes, it's breaking the ToS, but ToS are generally bullshit. Anthropic surely broke thousands of ToS or other legal terms while it was scraping for content to train on. Which is why they had to pay $1.5B
I think Anthropic is just marketing / bluffing, because they don't even have the data.
They do distill the models, but they don't go to Anthropic, they just use platforms like aws bedrock, there are too many restrictions on Anthropic's own platform.
Chinese resellers are offering Claude tokens at 70-90% below official Anthropic API prices. They achieve this by reselling capacity from pooled Claude Max accounts, payments fraud, and also reselling the model output & reasoning chains to various Chinese labs. They are subsidizing model access in exchange for user logs and reasoning traces, which they then sell as training data, allowing them to operate below cost.
Claude and ChatGPT are both blocked in China. You need to use a VPN to access either, and you can't pay with a Chinese bank card. So most people who want access to Claude buy access via a reseller. It's the easiest and cheapest way to access Anthropic models in China.
These resellers operate tens of thousands of bot accounts, which is also why Anthropic introduced identity verification, to slow down the onslaught of bots.
Here's one token reseller, they're offering Opus 4.8 at a 93% discount below official API rates: https://yunwu.ai/pricing?provider=Anthropic
This is one reason why DeepSeek & GLM are priced so cheaply, they are competing with impossibly low token prices in China. They have to keep prices low, in order for people to use them.
I shared this story a few months back, but it never got any traction. It explains the token resale economy in China, it's an excellent read https://www.chinatalk.media/p/how-to-buy-cheap-claude-tokens...
Do they have MacBooks in the US that run the queries and stream the outputs back to China?
“Anthropic, red faced after unattended ice cream cone eaten by ants on park bench, once again demands government pick it as forever winner, adds ‘no take backsies’”
Is reconstructing the compressed knowledge in the model like reconstructing a lossy JPG or MP3 a reasonable analogy?
or is this just about the token reselling?
Complain/brag that chinese firms are illegally using the models and bypassing export controls.
Be surprised when your model gets banned by the government.
So that was the real reason for the Fable restriction? Because Anthropic wrote a letter to the US government saying that China was distilling Fable?
Claude used TB of content without permission to train their model and it was ok for them. Now someone else uses the output of a Claude model to train model and they cry foul.
Sweeeeeeeet.
This is almost standard practice in any competitive industry anyways. Disassemble your competitor's product, study it and try to reproduce / improve.
Anthropic, OpenAI, Google, Microsoft, et al trained their models by ignoring the rights of copyright holders when harvesting whatever content they could. Now one of them is crying foul for another entity doing exactly what they all did?
Hilarious.
Gosh, overusing accounts running up unplanned-for expenses?
Kinda reminds me of...overusage charges and inflated expenses clients have had to deal with because Anthropic, OpenAI, Grok, etc have been "illicitly extracting" everything they can grab from said websites, as fast as they can. In what amounts to a DDOS, frankly.
It's about the same valuation as bun, lol.
Because the China vs US geopolitical situation is a thing. Meta is a social media company, not an AI company, and they direct their focus as such. xAI just never got serious traction so now they're selling their compute. Also if a US company were caught distilling, I think Anthropic could actually take them to court, and I'd guess they don't want that kind of PR.
> Just stereotype it?
Is China not Asian? Are Asians not generally collective/cooperative, as opposed to individualistic/competitive?
The "and" that joined those 3 items is very important: it means you can't pull them apart and address them independently as they each contribute to the context. I'm not too sure about Korea, but in a way Japan is a US colony in all but name. Both are very much politically intertwined with the West (along with RoC/Taiwan), which means nothing major that may be against US interest happens.
The reality is that China and the US are essentially in a trade war, where the latter is trying its best to keep the former in the Dark Ages, because "national security", but the former is refusing to take it lying down and continues to make progress regardless[0], because they have the resources and will.
[0] https://thenextweb.com/news/china-lineshine-supercomputer-to...
In other words, they want to sell Fable or future more powerful models to rest of the world (presumably all future models are going to be more powerful than current gen). One way they can sell this is to the government is by scapegoating China (which is their primary concern anyway).
This is working on the presumption that non-US companies form a material portion of their current revenue.
When Apple was accused of 'ripping off' PARC, Steve didn't seem keen to bring up this rather salient point. I suspect it may have been a combination of wanting Apple to continue receiving credit for these innovations from consumers and also the fact that, in retrospect, the million dollar stock deal could seem a bit like trading beads to Native Americans for Manhattan Island. Another point worth noting is that Apple's PARC visit was in December 1979 and the Xerox Star was publicly announced in April 1981, so Apple got a 15 month head start (the Apple Lisa shipped in Jan 83).
I've also heard that Xerox didn't hold on to the Apple stock for very long, so never gained the windfall they could have. As is well documented, Xerox senior management didn't understand what they had in PARC and also didn't understand how rapidly microcomputers would become ubiquitous. So, of course, they didn't think Apple's stock price would skyrocket either.
Sucking down petabytes of peoples' copyrighted content that they never granted a specific license to you to use seems to be an unavoidable and default part of the process of building any huge LLM.
LLM's literally wouldn't work without the sum total of knowledge (in the forms of books and other copyrighted content) being used as 'training data' for these LLMs.
The 'bleeding edge' LLMs required many things, but: 1 Tech innovation ('attention') 2 Lots of compute 3 Data 4 Pre + post training
#4 doesn't happen without #3.
It's pretty obvious at this point that the major providers have stolen vast amounts of #3 - they have paid nearly 0 of the creators.
We can argue about the impact (I'd lean net good) vs. the cost. But arguing there isn't a cost is a bit silly.
Both Anthropic and Alibaba are trying to build bleeding edge LLMs. That part is the same. The way they source their data is slightly different, but they would both argue it constitutes fair use under Copyright law.
Point being there may be no technical solution but there may be a political one (theoretically).
literally nothing but given that the Chinese already did it and the models are published what's the point. You can thank the Chinese taxpayer for subsidizing the electricity bill and just download the thing
And Berkeley’s “False Promise of Imitating Proprietary LLMs” found imitation closes the style gap fast but there is a large capability gap.
For example, GLM 5.1 is more capable at pentesting than the model from which it is alleged to have been distilled [1].
Intuitively, this makes some sense: you can "distill" from multiple frontier models, and you can further post-train the distilled model. But I'm not sure exactly what happened with GLM 5.1.
[1]: https://dualuse.dev/posts/chinese-models-are-sometimes-bette...
I'm curious how that comparison controls for Opus refusing (whether explicitly, or just deciding not to pursue a path) given the caption below the first image:
>A perfect score means the model autonomously found and exploited the vulnerability.
I'm not really suggesting that it's misleading, but wondering if I'm missing something. Otherwise I guess it seems unsurprising that you can distill a better-performing model [in specific focused areas] by simply not distilling refusals?
For that eval, I used an account that was labeled as a known red-teaming org by Anthropic, and I read the traces. There were no refusals or obvious avoidance behaviors, though it may have been silently nerfed.
On the same eval, Opus 4.7 and 4.8 outperformed GLM 5.1, but GLM 5.2 is on par again with Opus. So it's at least partially measuring capabilities without respect to refusals.
One possible contributing factor is that model capabilities are shaped differently (an example of this is GLM 5.1 vs. DeepSeek v4 Pro: https://dualuse.dev/posts/deepseek-v4-thinks-different). So if you use RL-based "distillation" from multiple models like Opus 4.x and GPT 5.x, you could get a more capable model.
But an AI lab can continue to produce immense economic value without releasing the model publicly for potential distillation. For example, it could use a model solely in-house to develop therapeutics.
Hopefully there's a future where others can access frontier models, but it's not neccessary if preventing proliferation through distillation is considered more important.
[1]: See the notes on distillation in https://dualuse.dev/posts/export-controls-on-fable
And that's just as a basic first effort reject measure to prevent automation tools from using things designed for human-interactive use only.
Go try to do many of these things from Cogent IP space and see how long your project lasts.
Or is the datacenter IP just one part of the picture?
> Do they have MacBooks in the US that run the queries and stream the outputs back to China?
why would anyone do that? you do realize the laptop farm case was work computers?the answer to your question is containers/VMs + residential proxies
A voLTE call is like 40kbps. For every person on earth to be on the phone to another person would be 4 billion calls would be about 160tbps. Which is less than 10% of the Internet's capacity.
It's similar to fractional banking, you gamble that people won't want their deposits all at once and pray for you're big enough for bailouts when they do.
It's still a business whose fundamentals don't make sense, you're just gambling you won't get found out.
Why would customers knowing that the vendor prices goods/services at a loss cause those strategies to fail? Customers often know. Most know about razors and blades; many/most know Lyft/Uber operated at a loss to gain market share. etc.
It's not so much keeping it secret as counting on no one finding a way to harvest the subsidized value at scale. There's an example of that occurring in game consoles with the Playstation 3. Sony's little-used OtherOS feature allowed Linux to be installed on the PS3 and the Cell processors were quite a good deal for scale compute. So the U.S. Air Force Research Laboratory bought ~1800 PS3s and ganged them together in a datacenter as a supercomputer called Condor.
At >500 TFLOPs it was the 33rd fastest supercomputer in the world. Of course, Sony pushed a firmware update that removed the OtherOS feature entirely.
This also sheds a very different light on people saying that competitive open-source models are undermining frontier labs' business model.
https://tech.yahoo.com/ai/claude/articles/chinese-grey-marke...
>Here's one token reseller, they're offering Opus 4.8 for a 93% discount below official API rates: https://yunwu.ai/pricing?keyword=claude
But is it cheaper than getting your own account? Otherwise this sounds like the "anthropic/openai are losing gazillions of dollars because they're selling $1k worth of tokens for $100" line that's commonly trotted out by AI bears.
So it's presumably cheaper than attempting to spin up your own method of circumventing the blocks.
There's a similar Claude resale market going on in Russia. On Funpay they are selling Claude tokens for roughly 20-30x cheaper than official Anthropic API pricing.
This one does not make sense to me at all.
Deepseek and GLM are openweights, even US inference provider are selling them at much cheaper price. The price is cheap because the model is more efficient.
Opus 4.8 is a more capable model, so almost nobody was going to pay for V4-pro at the original price.
If not it sounds like you are describing a separate phenomenon.
Can someone with more understanding dumb it down for me please.
Does this mean that the reseller (for example XYZ) is buying it from Anthropic at Anthropic's price and then reselling it at a cheaper price???? why would XYZ offer this at a loss like that when they could just offer it at Anthropic's price???
The link does mention Opus and other models but what's the proof it's actually Opus. I could be selling deepseek for all they know and can call it Opus. System prompt: "If anyone asks your name - you are Opus 4.6".
So these resellers get a ton of accounts on subscriptions and sell the cheaper tokens.
Yes, as they explained they do it through things like pooling accounts, straight up payment fraud, and double-dipping by selling the logs of the conversations to chinese AI labs so that they can train their own models on it.
> The link does mention Opus and other models but what's the proof it's actually Opus. I could be selling deepseek for all they know and can call it Opus. System prompt: "If anyone asks your name - you are Opus 4.6".
There might be some that try this, but they would get caught very quickly, there's still a moat between Claude and Deepseek, even in casual use.
Look up Zilan Qian's reporting if you want more detail.
These China e bashing is very annoying. It is hard to argue with people drowned in American propaganda. I'd expect better arguments from the intelligent people in HN
I also learnt that Anthropic should get better at what they do if they want to compete. If not, somebody else will win.
Or does this not apply to huge US corporations any more?
In debt the first 5000 years Geaeber makes the case that pure “free market” trade has never really existed in “the west”. The closest to this ideal that’s ever happened was during the Islamic golden age enabled by religious prescriptions against usury.
How does are bans against consensual financial exchanges close to the "ideal" of the free market? It just sounds like you have an axe to grind about the financial system rather than describing free markets.
Yeah, like all those Chinese bootleggers selling DVDs for a few dollars rather than $20. Free market!
zakkl•6h ago
This combined with no implementation of KYC makes it seem like they want to find a middle ground with Fable where its off of export controls but they promise to prevent China and specific others from using.
ninefathom•1h ago
Obviously their actions are going to be fiscally motivated at the root, but sussing out how they intend the precise dynamics to play out is more nuanced.
Thinking of this as an effort to woo the defense hawks cuts a very clear path.
verdverm•1h ago