>DeepSeek claimed to have built its base model for about 5% of the estimated cost of GPT-4
[edit] also they seem to be saying r1 is a base model, which it is not. Very sloppy.
Meta was caught pirating over 80TB of books to train their AI, and they are claiming not only training AI on other people's stuff is legal, but piracy is also legal (well at least, piracy done by US tech giants is legal)
They published about GRPO (key algorithm behind R1) a full year before[1] they scaled it for R1. Given the research they do in open, it's not far-fetched to think they had the talent and technical know-how to achieve R1 on their own.
https://www.servethehome.com/dude-dell-hpe-ami-american-mega...
But yeah, saying the chips are everywhere is BS.
They doubled down on it. They did a follow-up claiming that a cyber security researcher from a US-based firm had been called in to investigate suspicious traffic at a US telecom. The investigators claimed to have logs and a bunch of other evidence. The investigators also claimed that Bloomberg was misleading people by focusing on SuperMicro, as they'd reportedly seen to with other manufacturers too.
Discovered by our security team where I work. It's the reason our VMS doesn't have support for Hikivision cameras.
The Federal government and some banks hire companies to do supply chain integrity inspection and management. They find bad parts all of the time, especially in the channel.
There’s a pretty obvious reason why they wouldn’t want to talk about a detected case of foreign espionage embedded in servers after publishing.
If you believe that Deepseek was released to undercut US AI value (duh) it makes no sense to take the official line as the absolute truth.
Typical models are now trained on clusters of roughly 20K GPUs. Even if you get a volume discount you still need cabling, switches, etc…
The minimum entry price to play in the game at this level is about 200-500 million dollars.
Meta spent something like $10B on their AI compute, for comparison.
Just trust them bro. Unironically.
"DeepSeek-V3 costs only 2.788M GPU hours for its full training. Assuming the rental price of the H800 GPU is $2 per GPU hour, our total training costs amount to only $5.576M. Note that the aforementioned costs include only the official training of DeepSeek-V3, excluding the costs associated with prior research and ablation experiments on architectures, algorithms, or data." https://arxiv.org/pdf/2412.19437
They don't even claim to have spent $5M (since they own their GPUs instead of renting them by the hour), it's a purely notional figure suitable for an academic paper. But when R1 got released and started generating hype, it was the only dollar figure anyone had to go on, so it got interpreted as more significant than it is.
That said, Deepseek is a decent model and was the forcing function needed to give a reality check to a number of AI Startups (and has had the positive effect of making it easier for startups I've helped incubate make the case for their own domain specific foundation model strategy). It's impact shouldn't be understated.
The field is really moving too quickly to talk too certainly about "dominance" or "ahead". My observation is projects I care about on GitHub come with a Chinese README and many interesting talkers at conferences have strong Chinese accents. But I know a good researcher personally and it isn't so apparent to me if these are Chinese Chinese people or Americans of recent Chinese descent.
So the chinese gov't will need to also invest in hardware production - and surely they are furiously doing so (and getting limited success, but success none the less).
The american chip sanctions is, in my view, an own-goal. In the short term, it might cause some pain, but in the medium to long term, it is the kick that the chinese market would need to adapt. Necessity is the mother of all inventions after all. It might take 10 years, but i have no doubts that china can reach a level equal to that of TSMC.
If the US administrations (both current and previous) had any brains, they should've seen this. They should've put subsidies into chips so that chinese production will not be competitive, and chinese firms will lose money if they go domestic. And the export of such hardware would balance the trade deficits.
You might find this paper from the Hoover Institution interesting, it w goes into some depth analyzing the implications of DeepSeek on US innovation: https://www.hoover.org/sites/default/files/research/docs/Zeg...
Similarly, withholding funding for research, meddling in how universities are supposed to conduct their affairs, the reduced appeal of studying in the US for foreign students, putting wrestling promoter Linda McMahon in charge of dept of Education… these are all going to impact America’s research and innovation abilities.
It’s been 6 months? Geminis big upgrade was 2 months ago and o3 even more recent.
It’s just funny that US companies just barely got ahead the last couple months and already it’s a “drawn out narrative” that they aren’t ahead.
For all we know R2 drops tomorrow? If it’s ahead or even how are we supposed to think about the narrative?
IMO it’s not really that much of a stretch to say they’re fairly close together. I’d want to wait 6 more months where the US stayed significantly ahead before I’d be complaining about narratives. I know things move fast but that’s all the more reason to wait and see.
I hope that R2 releases tomorrow and you enjoy some presumed clairvoyance for a minute.
It makes for interesting television.
For that reason, it probably won't stop anytime soon.
(But I use it for actual work, not for chatting with imaginary friends. Maybe you really do need a "frontier model" if you want to monetize imaginary friends. I woun't know or care.)
However, on cost, R1 beats the Western models by miles.
Also, if China keeps using this type of tech to imprison their own population even more effectively, that’s also good for the US, because no one wants to flee to an even better dystopia.
I see no downside here. Force US to innovate beyond “it costs a lot of money and we conveniently had that upfront” while also undercutting the law makers and people trying to enforce regulatory capture on a new thing like they’ve done on all the old things.
As a person interested in tech and tools and America, I have no issues with Deepseek and Hunyuan and Wan being effectively CCP funded. Keep it up. Accelerate. Push.
like this:
DeepSeek’s founder is threatening US dominance in AI race
The fact that this guy could see that massive data analysis with was a winning investment strategy and then out compete others with way more experience in financial markets is impressive.
I’d be curious in the markets he initially invested in. Was this a market inefficiency specifically in China in the late 2000s?
I’ve always assumed that quantitative analysis requires PhD level knowledge of markets and mathematics but maybe I’m being way too conservative?
It would mean some harsh years at first, but it’s a good time to hit the market.
I remember being told I’d never be successful, or make as much money as my parents.
I only wish I hadn’t listened to those people so long.
blumpy22•5h ago