DeepSeek’s founder is threatening US dominance in AI race

https://www.bloomberg.com/news/features/2025-05-13/deepseek-races-after-chatgpt-as-china-s-ai-industry-soars

89•blumpy22•9mo ago

Comments

blumpy22•9mo ago

https://archive.ph/HIDaS

htrp•9mo ago

Still parroting the same uninformed takes from January

>DeepSeek claimed to have built its base model for about 5% of the estimated cost of GPT-4

juujian•9mo ago

What do we know now that we did not know in Jan? Is there some information on this that I have missed?

fzzzy•9mo ago

They didn't include the costs for developing v3, the base model.

[edit] also they seem to be saying r1 is a base model, which it is not. Very sloppy.

xnx•9mo ago

Didn't they also train off of ChatGPT API output?

ipsum2•9mo ago

This is a rumor that has not been confirmed by OpenAI nor DeepSeek.

nextaccountic•9mo ago

I don't see the issue here. OpenAI trained ChatGPT off my own comments, and your comments, and the comments from the person you replied to.. I didn't authorize it and you probably didn't too.

Meta was caught pirating over 80TB of books to train their AI, and they are claiming not only training AI on other people's stuff is legal, but piracy is also legal (well at least, piracy done by US tech giants is legal)

xnx•9mo ago

For sure. I was just pointing out that DeepSeek is not going to "beat" ChatGPT if DeepSeek relies on it.

natrys•9mo ago

You could maybe make that accusation about V3 (to the extent that it's a bad thing and not fair use, specially considering amoral origin of OpenAI's models in first place), but don't think the claim makes sense for R1 since OpenAI's o1 did not expose its CoT traces even in API.

They published about GRPO (key algorithm behind R1) a full year before[1] they scaled it for R1. Given the research they do in open, it's not far-fetched to think they had the talent and technical know-how to achieve R1 on their own.

[1] https://arxiv.org/abs/2402.03300

irjustin•9mo ago

Yeah apparently they parked the cost of hardware, 50k GPUs and model development underneath another entity, high-flyer because it was "shared resource".

AustinCarrBW•8mo ago

You're misreading. The article is referring to V3 when it cites the base model behind R1. It does not say R1 is the base model.

Analemma_•9mo ago

Bloomberg still has not retracted (or even really commented on) the Supermicro spy chip story, preferring to hope people just forget about it if they maintain total silence. They're fine if you need to look up where the Nasdaq closed yesterday, but don't expect serious tech reporting from them.

mannyv•9mo ago

It's interesting because servethehome.com found some oddly labeled chips at one point:

https://www.servethehome.com/dude-dell-hpe-ami-american-mega...

But yeah, saying the chips are everywhere is BS.

protimewaster•9mo ago

> not retracted (or even really commented on) the Supermicro spy chip story

They doubled down on it. They did a follow-up claiming that a cyber security researcher from a US-based firm had been called in to investigate suspicious traffic at a US telecom. The investigators claimed to have logs and a bunch of other evidence. The investigators also claimed that Bloomberg was misleading people by focusing on SuperMicro, as they'd reportedly seen to with other manufacturers too.

NicoJuicy•9mo ago

Hikivision literally sends hidden packages from their cameras to China.

Discovered by our security team where I work. It's the reason our VMS doesn't have support for Hikivision cameras.

ta20240528•8mo ago

"Discovered by our security team where I work"

Any published where?

NicoJuicy•8mo ago

Internal audit in the company for customers ( security Company).

A couple of years later, our US customer had the same conclusion. Probably that one was published ( US government)

ta20240528•8mo ago

Anecdotes are not data.

Really - they aren't.

NicoJuicy•8mo ago

We literally have a hardware device that removes any "spyware" from communicating externally because of it.

People in the security industry ( cameras) may know it.

Spooky23•9mo ago

You’re really naive. Bloomberg is one of the better news outlets, and I would put no weight on Supermicro’s denials, as they have a pattern of lying about financials and have a sweepingly broad supply chain vulnerable to sabotage and counterfeiting.

The Federal government and some banks hire companies to do supply chain integrity inspection and management. They find bad parts all of the time, especially in the channel.

There’s a pretty obvious reason why they wouldn’t want to talk about a detected case of foreign espionage embedded in servers after publishing.

zeroq•9mo ago

I'll put my tinfoil hat on and say it plays to the current US vs China "propaganda" tune, that US is winning on all fronts, but the ice thin and have to support local tech behemoths to full extent to secure our position in this world defining struggle.

cookiemonsieur•8mo ago

It's not US vs China. It increasingly looks like China VS a conglomerate of multi national companies with foreign born billionaire CEOs whose HQs happen to be located in the USA.

cedws•8mo ago

China vs Chinese migrants in the US

tempeler•9mo ago

The accuracy of this comparison is highly speculative. One should not ignore the possibility that dominant firms in the market might be inflating their cost figures to block new entrants and extract more capital from investors through such narratives. When you compare electricity prices in China with those in the U.S., such a large gap would require a truly extraordinary breakthrough to be justified.After all, these are privately held commercial firms, and they are not obligated to disclose their financials accurately.

SV_BubbleTime•9mo ago

Agreed. For all anyone really knows, it was 5% for OpenAI compared to Deepseek.

If you believe that Deepseek was released to undercut US AI value (duh) it makes no sense to take the official line as the absolute truth.

jiggawatts•9mo ago

Electricity is the cheapest input by far! People and large networked GPU clusters are far more expensive.

Typical models are now trained on clusters of roughly 20K GPUs. Even if you get a volume discount you still need cabling, switches, etc…

The minimum entry price to play in the game at this level is about 200-500 million dollars.

Meta spent something like $10B on their AI compute, for comparison.

charleshn•9mo ago

Indeed, see https://semianalysis.com/2025/01/31/deepseek-debates/ for a well researched article about the actual cost.

oefrha•9mo ago

The provided source for every concrete figure on DS in that article is "we are confident that", "we believe" or something equivalent. How is it any better researched than any other article with a conflicting set of beliefs?

Der_Einzige•9mo ago

There are times when "just trust me bro" is okay. Semianalysis articles are one of those times. You are free to pull the contrarian "source please" shit, but the reality is that they are far more accurate at most types of GPU, Cloud, or AI analysis than almost anyone on this websites or anywhere else.

Just trust them bro. Unironically.

oefrha•9mo ago

We're talking about a narrative that's affecting tens of billions of investment dollars, possibly more. I'm not gonna "trust me bro" anyone on this.

yorwba•9mo ago

Then you definitely shouldn't trust the unsourced, false claim that "DeepSeek claimed to have built its base model for about 5% of the estimated cost of GPT-4." DeepSeek never claimed this.

oefrha•9mo ago

Yes, I don’t trust that either.

Der_Einzige•8mo ago

Don't act like the ambiguity of their language wasn't intentional. It was. They on purpose worded it in the paper so that investors mass freaked out about the idea of GPT-4 for a few million while AI researchers laughed as they read the "this is only the final training run" gigantic astrik.

saagarjha•8mo ago

And you expect a random Hacker News commenter to tell you how they're allocating their billions?

oefrha•8mo ago

Don’t know what you’re talking about. I read an HN comment saying here’s a well researched article, followed the link, and found a trust-me-bro article. Where did I ask for advice from an HN commenter?

saagarjha•8mo ago

You aren't going to get non-"trust me bro" insight except by asking Sam Altman himself. I assume he has more interesting things to do with his time at the moment. Barring that, you're going to get the guy who wines and dines GPU suppliers and sees if he can connect any crumbs of information that drop.

AustinCarrBW•8mo ago

This is in the Businessweek story

yorwba•9mo ago

What DeepSeek actually claimed:

"DeepSeek-V3 costs only 2.788M GPU hours for its full training. Assuming the rental price of the H800 GPU is $2 per GPU hour, our total training costs amount to only $5.576M. Note that the aforementioned costs include only the official training of DeepSeek-V3, excluding the costs associated with prior research and ablation experiments on architectures, algorithms, or data." https://arxiv.org/pdf/2412.19437

They don't even claim to have spent $5M (since they own their GPUs instead of renting them by the hour), it's a purely notional figure suitable for an academic paper. But when R1 got released and started generating hype, it was the only dollar figure anyone had to go on, so it got interpreted as more significant than it is.

AustinCarrBW•8mo ago

The article specifically says "it's likely this sum referred only to the final training run—a data-refinement process that transforms a model’s previous prototypes into a complete product—but many people perceived it as an insanely low budget for the entire project." The article also delves into the SemiAnalysis report, as well as denials from ex-DeepSeek employees.

swordfish69•9mo ago

Can we stop this drawn-out narrative that Deepseek is at the level of Gemini or o3? It’s brilliant in its own way but for some reason a lot of journalists think it’s still at par with American frontier models.

alephnerd•9mo ago

Journalists give what their readers want, and what they want is a discussion about a US-China race or "AI". There is also an equity ownership aspect as well, because tech stocks in China tend to be the primary market in the green within the larger SSE and Hang Sen, and a DeepSeek/AI story makes China oriented emerging market ETFs much more enticing. Same reason you see much more financial reporting in American business news about India now that Indian equities are now available in emerging market ETFs.

That said, Deepseek is a decent model and was the forcing function needed to give a reality check to a number of AI Startups (and has had the positive effect of making it easier for startups I've helped incubate make the case for their own domain specific foundation model strategy). It's impact shouldn't be understated.

roenxi•9mo ago

I'd reference something like https://llm-stats.com/ which suggests that the story is ... muddled. On the one hand, Deepseek is clearly not leading. On the other hand, they aren't really "behind" in any sense I care about. They'd have world-leading performance with their models this time last year.

The field is really moving too quickly to talk too certainly about "dominance" or "ahead". My observation is projects I care about on GitHub come with a Chinese README and many interesting talkers at conferences have strong Chinese accents. But I know a good researcher personally and it isn't so apparent to me if these are Chinese Chinese people or Americans of recent Chinese descent.

lwo32k•9mo ago

Americans can raise more cash. They are still pretty unbeatable on that front. So until that changes they will always be ahead no matter what happens on the tech front.

jrvarela56•9mo ago

Not if the Chinese govt is interested.

chii•9mo ago

at the level of chinese gov't, the "cash" is going to pay for hardware. And it's american hardware currently leading the frontier, and the sanctions on it have made it hard to officially procure large amounts of compute from nvidia.

So the chinese gov't will need to also invest in hardware production - and surely they are furiously doing so (and getting limited success, but success none the less).

The american chip sanctions is, in my view, an own-goal. In the short term, it might cause some pain, but in the medium to long term, it is the kick that the chinese market would need to adapt. Necessity is the mother of all inventions after all. It might take 10 years, but i have no doubts that china can reach a level equal to that of TSMC.

If the US administrations (both current and previous) had any brains, they should've seen this. They should've put subsidies into chips so that chinese production will not be competitive, and chinese firms will lose money if they go domestic. And the export of such hardware would balance the trade deficits.

peterlada•8mo ago

The "American" chips sanctions is outsourced. It is a single Dutch company (ASML) in EU which machines are installed in Taiwan. The EU which Humpty-Trumpy is working very hard to completely alienate...

jrvarela56•8mo ago

The assumption is that AGI could be near so all you need is a 5 year lead. In that sense, short term blocking despite giving them long term incentive to build manufacturing capacity is worth it.

JSR_FDED•9mo ago

I think that’s too simple.

You might find this paper from the Hoover Institution interesting, it w goes into some depth analyzing the implications of DeepSeek on US innovation: https://www.hoover.org/sites/default/files/research/docs/Zeg...

Similarly, withholding funding for research, meddling in how universities are supposed to conduct their affairs, the reduced appeal of studying in the US for foreign students, putting wrestling promoter Linda McMahon in charge of dept of Education… these are all going to impact America’s research and innovation abilities.

coliveira•8mo ago

The big resource in technology is not cash. It is human effort in engineering and science. Putting more cash into finite resources can only result in inflation, which makes additional cash useless.

nwienert•9mo ago

It’s funny, R1 came out and matched 4o/o1 at the time, you could claim it was very slightly behind but it was basically even.

It’s been 6 months? Geminis big upgrade was 2 months ago and o3 even more recent.

It’s just funny that US companies just barely got ahead the last couple months and already it’s a “drawn out narrative” that they aren’t ahead.

For all we know R2 drops tomorrow? If it’s ahead or even how are we supposed to think about the narrative?

IMO it’s not really that much of a stretch to say they’re fairly close together. I’d want to wait 6 more months where the US stayed significantly ahead before I’d be complaining about narratives. I know things move fast but that’s all the more reason to wait and see.

SV_BubbleTime•9mo ago

> For all we know R2 drops tomorrow? If it’s ahead or even how are we supposed to think about the narrative?

I hope that R2 releases tomorrow and you enjoy some presumed clairvoyance for a minute.

jackt89•9mo ago

Based on DeepSeek's release cadence of R1 and V3-0324, it will drop on 5/19 or 5/26, so you are not far off

SubiculumCode•9mo ago

R1 has a much bigger hallucination problem than Gemini does...which made it a no go

blitzar•8mo ago

Blind patriotism (or cultism) clouds logic and rots the brain.

IncreasePosts•8mo ago

But didn't R1 use openai/google models to generate the data to train on? So the only reason R1 could exist is necessarily because those models predated it.

ijidak•9mo ago

This is not so different than the greatest of all time debate in sports.

It makes for interesting television.

For that reason, it probably won't stop anytime soon.

otabdeveloper4•9mo ago

I use Qwen 2.5, it works better for my tasks than larger models.

(But I use it for actual work, not for chatting with imaginary friends. Maybe you really do need a "frontier model" if you want to monetize imaginary friends. I woun't know or care.)

csomar•9mo ago

In absolute scores, no one is leading. They all plateaued around the same level. The difference is that models are optimized in different ways. This makes R1 useful/ahead for some people but not for others.

However, on cost, R1 beats the Western models by miles.

yard2010•9mo ago

Why don't you ask (it about) the kids at timn square, was fashion the reason, why were they there?

cookiemonsieur•8mo ago

> Why don't you ask (it about) the kids at timn square, was fashion the reason, why were they there?

Western models have been proven to be heavily censored, under the guise of fighting antisemitism for example.

mensetmanusman•9mo ago

Good, the more the better.

Also, if China keeps using this type of tech to imprison their own population even more effectively, that’s also good for the US, because no one wants to flee to an even better dystopia.

SV_BubbleTime•9mo ago

I talked with a Chinese friend at Meta about this. We agreed that no one would have been interested in Deepseek as a Chinese run service (loose “no one”) but as a tool to undercut the value of US AI, it’s seemingly effective.

I see no downside here. Force US to innovate beyond “it costs a lot of money and we conveniently had that upfront” while also undercutting the law makers and people trying to enforce regulatory capture on a new thing like they’ve done on all the old things.

As a person interested in tech and tools and America, I have no issues with Deepseek and Hunyuan and Wan being effectively CCP funded. Keep it up. Accelerate. Push.

suraci•9mo ago

[flagged]

tomhow•9mo ago

Please don't post nationalistic swipes like this in comments on HN. We've had to ask you to avoid political flamewar comments before. Please make an effort to correct this or we'll have to ban the account.

suraci•9mo ago

will it not be a 'nationalistic swipe' if I just repost the title of the news?

like this:

DeepSeek’s founder is threatening US dominance in AI race

tomhow•8mo ago

The title is clearly in a very different style to your comment. It's just not very hard to show that you have a sincere intention of using this site in accordance with the guidelines. The intention of this site is to be a place that satisfies intellectual curiosity. That's the reason people come here rather than the many other discussion sites on the web; to find interesting content and engage in interesting discussions. If you don't want that, it's fine, you don't have to participate here. If you want to use this site, please don't drag down the standard of discussion.

suraci•8mo ago

imho, the title is clearly in a certain style that rhymes with my comment

anyway, thank you very much for the explanation, i'll bookmark the title to learn the style well

Pandabob•9mo ago

Feels to me that it's Google which has done the most recently to optimize the cost/performance-ratio of these models and no one seems to be talking about it.

plandis•9mo ago

Its fascinating to me that someone in their early 20s in the 2008 worldwide economic recession to have that much economic success.

The fact that this guy could see that massive data analysis with was a winning investment strategy and then out compete others with way more experience in financial markets is impressive.

I’d be curious in the markets he initially invested in. Was this a market inefficiency specifically in China in the late 2000s?

I’ve always assumed that quantitative analysis requires PhD level knowledge of markets and mathematics but maybe I’m being way too conservative?

pyuser583•9mo ago

Being in your early 20s during the economic crisis, would mean spending your early to mid career during the economic boom of the 2010s.

It would mean some harsh years at first, but it’s a good time to hit the market.

I remember being told I’d never be successful, or make as much money as my parents.

I only wish I hadn’t listened to those people so long.

naming_the_user•8mo ago

People generally ascribe far too much importance to general market conditions when it comes to their individual success.

A good market helps you become a bog standard boring wage slave, maybe get a mortgage, etc.

The outsized success folks will go out and get what they need regardless, they aren’t waiting for it to come to them.

ta20240528•8mo ago

As they are doing in huge numbers in Zimbabwe, South Sudan, Palestine,…

This is nonsense. Other than the local mafia, almost all extremely successful folk live in extremely affluent markets.

naming_the_user•8mo ago

I am obviously not talking about failed states, market crashes in developed countries are usually at max a 25-50% setback.

ta20240528•8mo ago

Its no obvious. Check your cultural assumptions on an international forum when making sweeping statements that only apply to a small minority.

For your own credibility.

surgical_fire•8mo ago

This hustle culture "get rich quick" mindset is such a societal disease.

Incredible that people would see the notion of having a moderately successful white collar career (maybe get a mortgage etc) as "boring wage slave".

jmatthews•9mo ago

In the same way that black hats will always be advantaged versus white hats, frontier models will always be advantaged versus derivatives.

nowittyusername•8mo ago

Its important to remember, that whatever the story behind deepseek, it's hard to believe they accomplished this feat with the same or more resources then the american companies. Which is to say, its safe to assume that at most, they had fewer resources then the american counterparts but created a model that is just as good. So regardless of the narrative, they deserve respect for that, let alone for the amount of open source information and weights they provided.

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

Haskell for all: Beyond agentic coding

SectorC: A C Compiler in 512 bytes (2023)

Speed up responses with fast mode

Software factories and the agentic moment

Total surface area required to fuel the world with solar (2009)

Brookhaven Lab's RHIC concludes 25-year run with final collisions

LLMs as the new high level language

Hoot: Scheme on WebAssembly

Stories from 25 Years of Software Development

Vocal Guide – belt sing without killing yourself

First Proof

Vouch

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Wood Gas Vehicles: Firewood in the Fuel Tank (2010)

Why there is no official statement from Substack about the data leak

Al Lowe on model trains, funny deaths and working with Disney

Show HN: A luma dependent chroma compression algorithm (image compression)

FDA intends to take action against non-FDA-approved GLP-1 drugs

Start all of your commands with a comma (2009)

Homeland Security Spying on Reddit Users

The AI boom is causing shortages everywhere else

I write games in C (yes, C) (2016)

Microsoft account bugs locked me out of Notepad – Are thin clients ruining PCs?

Selection rather than prediction

Learning from context is harder than we thought

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

Where did all the starships go?

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Reinforcement Learning from Human Feedback

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

Haskell for all: Beyond agentic coding

SectorC: A C Compiler in 512 bytes (2023)

Speed up responses with fast mode

Software factories and the agentic moment

Total surface area required to fuel the world with solar (2009)

Brookhaven Lab's RHIC concludes 25-year run with final collisions

LLMs as the new high level language

Hoot: Scheme on WebAssembly

Stories from 25 Years of Software Development

Vocal Guide – belt sing without killing yourself

First Proof

Vouch

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Wood Gas Vehicles: Firewood in the Fuel Tank (2010)

Why there is no official statement from Substack about the data leak

Al Lowe on model trains, funny deaths and working with Disney

Show HN: A luma dependent chroma compression algorithm (image compression)

FDA intends to take action against non-FDA-approved GLP-1 drugs

Start all of your commands with a comma (2009)

Homeland Security Spying on Reddit Users

The AI boom is causing shortages everywhere else

I write games in C (yes, C) (2016)

Microsoft account bugs locked me out of Notepad – Are thin clients ruining PCs?

Selection rather than prediction

Learning from context is harder than we thought

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

Where did all the starships go?

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Reinforcement Learning from Human Feedback

DeepSeek’s founder is threatening US dominance in AI race

Comments