frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

We ran a Unix-like OS Xv6 on our home-built CPU with a home-built C compiler

https://fuel.edby.coffee/posts/how-we-ported-xv6-os-to-a-home-built-cpu-with-a-home-built-c-compiler/
66•AlexeyBrin•2h ago•4 comments

Unheard works by Erik Satie to premiere 100 years after his death

https://www.theguardian.com/music/2025/jun/26/unheard-works-by-erik-satie-to-premiere-100-years-after-his-death
81•gripewater•4h ago•17 comments

MCP: An (Accidentally) Universal Plugin System

https://worksonmymachine.substack.com/p/mcp-an-accidentally-universal-plugin
4•Stwerner•8m ago•0 comments

Lossless LLM 3x Throughput Increase by LMCache

https://github.com/LMCache/LMCache
58•lihanc111•3d ago•7 comments

Finding Peter Putnam: The forgotten janitor who discovered the logic of the mind

https://nautil.us/finding-peter-putnam-1218035/
32•dnetesn•4h ago•8 comments

Show HN: I'm an airline pilot – I built interactive graphs/globes of my flights

https://jameshard.ing/pilot
1338•jamesharding•1d ago•178 comments

History of Cycling Maps

https://cyclemaps.blogspot.com/
49•altilunium•4h ago•6 comments

Antitrust defies politics' law of gravity

https://pluralistic.net/2025/06/28/mamdani/#trustbusting
16•almost-exactly•57m ago•4 comments

AlphaGenome: AI for Better Understanding the Genome

https://deepmind.google/discover/blog/alphagenome-ai-for-better-understanding-the-genome/
66•meetpateltech•3d ago•2 comments

Lago (Open-Source Usage Based Billing) is hiring for ten roles

https://www.ycombinator.com/companies/lago/jobs
1•AnhTho_FR•2h ago

JWST reveals its first direct image discovery of an exoplanet

https://www.smithsonianmag.com/smart-news/james-webb-space-telescope-reveals-its-first-direct-image-discovery-of-an-exoplanet-180986886/
286•divbzero•20h ago•125 comments

Engineer creates ad block for the real world with augmented reality glasses

https://www.tomshardware.com/maker-stem/engineer-creates-ad-block-for-the-real-world-with-augmented-reality-glasses-no-more-products-or-branding-in-your-everyday-life
92•LorenDB•6d ago•60 comments

I deleted my second brain

https://www.joanwestenberg.com/p/i-deleted-my-second-brain
350•MrVandemar•9h ago•217 comments

London's largest ancient Roman fresco is “most difficult jigsaw puzzle”

https://www.thisiscolossal.com/2025/06/mola-liberty-roman-fresco/
46•surprisetalk•4d ago•7 comments

C++ Seeding Surprises (2015)

https://www.pcg-random.org/posts/cpp-seeding-surprises.html
6•vsbuffalo•2d ago•0 comments

Normalizing Flows Are Capable Generative Models

https://machinelearning.apple.com/research/normalizing-flows
148•danboarder•17h ago•36 comments

After successfully entering Earth's atmosphere, a European spacecraft is lost

https://arstechnica.com/space/2025/06/a-european-spacecraft-company-flies-its-vehicle-then-loses-it-after-reentry/
15•rbanffy•3d ago•4 comments

Reinforcement learning, explained with a minimum of math and jargon

https://www.understandingai.org/p/reinforcement-learning-explained
150•JnBrymn•4d ago•7 comments

DeepSeek R2 launch stalled as CEO balks at progress

https://www.reuters.com/world/china/deepseek-r2-launch-stalled-ceo-balks-progress-information-reports-2025-06-26/
107•nsoonhui•1d ago•96 comments

A short history of web bots and bot detection techniques

https://sinja.io/blog/bot-or-not
33•OlegWock•4d ago•2 comments

Learn OCaml

https://ocaml-sf.org/learn-ocaml-public/#activity=exercises
169•smartmic•17h ago•57 comments

Untangling Lifetimes: The Arena Allocator

https://www.rfleury.com/p/untangling-lifetimes-the-arena-allocator
12•signa11•5h ago•3 comments

Qwen VLo: From “Understanding” the World to “Depicting” It

https://qwenlm.github.io/blog/qwen-vlo/
206•lnyan•23h ago•55 comments

Facebook is starting to feed its AI with private, unpublished photos

https://www.theverge.com/meta/694685/meta-ai-camera-roll
385•pier25•14h ago•231 comments

A brief history of children sent through the mail (2016)

https://www.smithsonianmag.com/smart-news/brief-history-children-sent-through-mail-180959372/
120•m-hodges•18h ago•111 comments

Weird Expressions in Rust

https://www.wakunguma.com/blog/rust-weird-expr
177•lukastyrychtr•23h ago•137 comments

IDF officers ordered to fire at unarmed crowds near Gaza food distribution sites

https://www.haaretz.com/israel-news/2025-06-27/ty-article-magazine/.premium/idf-soldiers-ordered-to-shoot-deliberately-at-unarmed-gazans-waiting-for-humanitarian-aid/00000197-ad8e-de01-a39f-ffbe33780000
559•ahmetcadirci25•6h ago•295 comments

10 Years of Pomological Watercolors

https://parkerhiggins.net/2025/04/10-years-of-pomological-watercolors/
211•fanf2•23h ago•29 comments

Facebook is asking to use Meta AI on photos in your camera roll you haven't yet

https://techcrunch.com/2025/06/27/facebook-is-asking-to-use-meta-ai-on-photos-in-your-camera-roll-you-havent-yet-shared/
5•absqueued•25m ago•1 comments

SymbolicAI: A neuro-symbolic perspective on LLMs

https://github.com/ExtensityAI/symbolicai
198•futurisold•19h ago•52 comments
Open in hackernews

DeepSeek R2 launch stalled as CEO balks at progress

https://www.reuters.com/world/china/deepseek-r2-launch-stalled-ceo-balks-progress-information-reports-2025-06-26/
107•nsoonhui•1d ago

Comments

sigmoid10•5h ago
https://archive.is/byKrB
teruakohatu•5h ago
The title of the article is "DeepSeek R2 launch stalled as CEO balks at progress" but the body of the article says launch stalled because there is a lack of GPU capacity due to export restrictions, not because a lack of progress. The body does not even mention the word "progress".

I can't imagine demand would be greater for R2 than for R1 unless it was a major leap ahead. Maybe R2 is going to be a larger/less performant/more expensive model?

Deepseek could deploy in a US or EU datacenter ... but that would be admitting defeat.

Davidzheng•5h ago
but deepseek doesn't actually need to host inference right if they opensource it? I don't see why these companies even bother to host inference. deepseek doesn't need outreach (everyone knows about them) and the huge demand for sota will force western companies to host them anyway.
teruakohatu•5h ago
Releasing the model has paid off handsomely with name recognition and making a significant geopolitical and cultural statement.

But will they keep releasing the weights or do an OpenAI and come up with a reason they can't release them anymore?

At the end of the day, even if they release the weights, they probably want to make money and leverage the brand by hosting the model API and the consumer mobile app.

ngruhn•5h ago
If they continue to release the weights + detailed reports what they did, I seriously don't understand why. I mean it's cool. I just don't understand why. It's such a cut throat environment where every little bit of moat counts. I don't think they're naive. I think I'm naive.
senko•4h ago
If you’re not appearing, you’re disappearing.

Now they are firmly on the map, which presumably helps with hiring, doing deals, influence. If they stop publishing something, they run the risk of being labelled a one-hit wonder who got lucky.

If they have a reason to believe they can do even better in the near future, releasing current tech might make sense.

Davidzheng•4h ago
I don't think any of these companies are aiming at long term goal of making money from inference pricing of customers.
diggan•4h ago
> I don't think any of these companies are aiming at long term goal of making money from inference pricing of customers.

What is DeepSeek aiming for if not that, which is currently the only thing they offer that cost money? They claim their own inference endpoints has a cost profit margin of 545%, which might be true or not, but the very fact that they mentioned this at all seems to indicate it is of some importance to them and others.

Davidzheng•4h ago
Well it's certainly helpful in the interim that they can recoup some money from inference. I'm just saying that with systems with more intelligence in the future can be used to make money in much better ways than charging customers for interacting with it. For instance it could conduct research on projects which can generate massive revenue if successful.
bionhoward•2h ago
If moving faster is a most, then open source AI could move faster than closed AI by not needing to be paranoid about privacy and welcoming external contributions
coderatlarge•2h ago
maybe they benefit from the usage data they collect?
impossiblefork•5h ago
The lack of GPU capacity sounds like bullshit though, and it's unsourced. It's not like you can't offer it as a secondary thing, sort of like O-3 or even just turning on the reasoning.
coderatlarge•3h ago
maybe they’re just waiting to see if they can run on chinese sourced silicon? just speculating
Thorrez•5h ago
The article says this:

>June 26 (Reuters) - Chinese AI startup DeepSeek has not yet determined the timing of the release of its R2 model as CEO Liang Wenfeng is not satisfied with its performance,

>Over the past several months, DeepSeek's engineers have been working to refine R2 until Liang gives the green light for release, according to The Information.

But yes, it is strange how the majority of the article is about lack of GPUs.

chvid•4h ago
I am pretty sure that the information has no access to / sources at Deepseek. At most they are basing their article on selective random internet chatter amongst those who follow Chinese ai.
roenxi•4h ago
Presumably there is a CEO statement somewhere. If DeepSeek said May, but it is almost July, that would call for some comment from them.

Although I'd like to know the source for the "this is because of chip sanctions" angle. SMIC is claiming they can manufacture at 5nm and a large number of chips at 7nm can get get the same amount of compute of anything Nvidia produces. It wouldn't be market-leading competitive but delaying the release for a few months doesn't change that. I don't really see how DeepSeek production release dates and the chip sanctions could be linked in the small. Unless they're just including that as an aside.

archon1410•3h ago
> If DeepSeek said May

It is pretty strange that DeepSeek didn't say May anywhere, that was also a Reuters report based on "three people familiar with the company".[1] DeepSeek itself did not respond and did not make any claims about the timeline, ever.

[1]: https://www.reuters.com/technology/artificial-intelligence/d...

rvnx•3h ago
How it is written it could be 3 anonymous and random guys from Reddit who heard about DeepSeek online.
tonfa•2h ago
The phrasing for quoting sources is extremely codified, it means the journalists have verified who the sources are (either insider or people with access with insider information).
Davidzheng•3h ago
Actually I think one of the researchers at Deepseek did say on Twitter but I think that tweet has since been deleted.
FooBarWidget•48m ago
Welcome to most China news. Many "well-documented" China "facts" are in fact cases like this: the media taking rumors or straight up fabricating things for clicks, and then self-referencing (or different media referencing each other in a circle) to put up the guise of reliable news.
rfoo•4h ago
Yes. And those random Internet chatter almost certainly doesn't know what they are talking about at all.

First, nobody is training on H20s, it's absurd. Then their logic was, because of high inference demand of DeepSeek models there are high demand of H20 chips, and H20s were banned so better not release new model weights now, otherwise people would want H20s harder.

Which is... even more absurd. The reasoning itself doesn't make any sense. And the technical part is just wrong, too. Using H20 to serve DeepSeek V3 / R1 is just SUPER inefficient. Like, R1 is the most anti-H20 model released ever.

The entire thing makes no sense at all and it's a pity that Reuters fall for that bullshit.

reliabilityguy•2h ago
> Using H20 to serve DeepSeek V3 / R1 is just SUPER inefficient. Like, R1 is the most anti-H20 model released ever.

Why? Any chance you have some links to read about why it’s the case?

terafo•2h ago
MLA uses way more flops in order to conserve memory bandwidth, H20 has plenty of memory bandwidth and almost no flops. MLA makes sense on H100/H800, but on H20 GQA-based models are a way better option.
reliabilityguy•2h ago
MLA as in multi-head latent attention?
terafo•2h ago
Yes
reliabilityguy•2h ago
Ah, gotcha. Thank you
sschueller•3h ago
> lack of GPU capacity due to export restrictions

Human progress that benefits everyone being stalled by the few and powerful who want to keep their moats. Sad world we live in.

crazygringo•20m ago
It's not about people wanting to keep it in moats.

It's about China being expansionist, actively preparing to invade Taiwan, and generally becoming an increasing military threat that does not respect the national integrity of other states.

The US is fine with other countries having AI if the countries "play nice" with others. Nobody is limiting GPU's in France or Thailand.

This is very specific to China's behavior and stated goals.

torginus•2h ago
I am a bit sceptical about whether this whole thing is true at all. This article links to another, which happens to be behind a paywall, saying 'GPU export sanctions are working' is a message a lot of US administration, people and investors want to hear, so I think there's a good chance that unsubstantiated speculation and wishful thinking is presented as fact here.
wizee•5h ago
They just recently released the r1-0528 model which was a massive upgrade over the original R1 and is roughly on par with the current best proprietary western models. Let them take their time on R2.
A_D_E_P_T•5h ago
At this point the only models I use are o3/o3-pro and R1-0528. The OpenAI model is better at handling data and drawing inferences, whereas the DeepSeek model is better at handling text as a thing in itself -- i.e. for all writing and editing tasks.

With this combo, I have no reason to use Claude/Gemini for anything.

People don't realize how good the new Deepseek model is.

energy123•4h ago
My experience with R1-0528 for python code generation was awful. But I was using a context length of 100k tokens, so that might be why. It scores decently in the lmarena code leaderboard, where context length is short.
diggan•4h ago
Would love to see the system/user prompts involved, if possible.

Personally I get it to write the same code I'd produce, which obviously I think is OK code, but seems other's experience differs a lot from my own so curious to understand why. I've iterated a lot on my system prompt so could be as easy as that.

tazjin•2h ago
Do you use the DeepSeek hosted R1, or a custom one?

The published model has a note strongly recommending that you should not use system prompts at all, and that all instructions should be sent as user messages, so I'm just curious about whether you use system prompts and what your experience with them is.

Maybe the hosted service rewrites them into user ones transparently ...

diggan•2h ago
> Do you use the DeepSeek hosted R1, or a custom one?

Mainly the hosted one.

> The published model has a note strongly recommending that you should not use system prompts at all

I think that's outdated, the new release (deepseek-ai/DeepSeek-R1-0528) has the following in the README:

> Compared to previous versions of DeepSeek-R1, the usage recommendations for DeepSeek-R1-0528 have the following changes: System prompt is supported now.

The previous ones, while they said to put everything in user prompts, still seemed steerable/programmable via the system prompt regardless, but maybe it wasn't as effective as it is for other models.

But yeah outside of that, heavy use of system (and obviously user) prompts.

Workaccount2•2h ago
The biggest reason I use Gemini is because it can still get stuff done at 100k context. The other models start wearing out at 30k and are done by 50k.
diggan•2h ago
The biggest reason I avoid Gemini (and all of Google's models I've tried) is because I cannot get them to produce the same code I'd produce myself, while with OpenAI's models it's fairly trivial.

There is something deeper in the model that seemingly can be steered/programmed with the system/user prompts and it still produces kind of shitty code for some reason. Or I just haven't found the right way of prompting Google's stuff, could also be the reason, but seemingly the same approach works for OpenAI, Anthropic and others, not sure what to make of it.

brokegrammer•2h ago
I'm having the same issue with Gemini as soon as the context length exceeds 50k-ish. At that point, it starts to blurp out random code of terrible quality, even with clear instructions. It would often mix up various APIs. I spend a lot of time instructing it about not writing such code, with plenty of fewshot examples, but it doesn't seem to work. It's like it gets "confused".

The large context length is a huge advantage, but it doesn't seem to be able to use it effectively. Would you say that OpenAI models don't suffer from this problem?

JKCalhoun•2h ago
New to me: is more context worse? Is there an ideal context length that maps to a bell curve or something?
diggan•43m ago
> New to me: is more context worse?

Yes, definitely. For every model I've used and/or tested, the more context there is, the worse the output, even within the context limits.

When I use chat UIs (which admittedly is less and less), I never let the chat go beyond one of my messages and one response from the LLM. If something is wrong with the response, I figure out what I need to change with my prompt and start new chat/edit the first message and retry, until it works. Any time I've tried to "No, what I meant was ..." or "Great, now change ..." the responses drop sharply in quality.

qwertox•5h ago
"We had difficulties accessing OpenAI, our data provider." /s
astar1•5h ago
This, my guess is OpenAI wised up after r1 and put safeguards in place for o3 that it didn't have for o1, hence the delay.
ozgune•4h ago
I think that's unlikely.

DeepSeek-R1 0528 performs almost as well as o3 in AI quality benchmarks. So, either OpenAI didn't restrict access, DeepSeek wasn't using OpenAI's output, or using OpenAI's output doesn't have a material impact in DeepSeek's performance.

https://artificialanalysis.ai/?models=gpt-4-1%2Co4-mini%2Co3...

astar1•3h ago
almost as well as o3? kind of like gemini 2.5? I dug deeper and surprise surprise: https://techcrunch.com/2025/06/03/deepseek-may-have-used-goo...

I am not at all surprised, the CCP views AI race as absolutely critical for their own survival...

orbital-decay•3h ago
Not everything that's written is worth reading, let alone drawing conclusions from. That benchmark shows different trees each time the author runs it, which should tell you something about it. It also stacks grok-3-beta together with gpt-4.5-preview in the GPT family, making the former appear to be trained on the latter. This doesn't make sense if you check the release dates. And previously it classified gpt-4.5-preview to be in a completely different branch than 4o (which does make some sense but now it's different).

EQBench, another "slop benchmark" from the same author, is equally dubious, as is most of his work, e.g. antislop sampler which is trying to solve an NLP task in a programmatic manner.

nsoonhui•5h ago
Not too sure why you are downvoted but OpenAI did announce that they are investigating on the Deepseek (mis)use of their outputs, and that they were tightening up the validation of those who use the API access, presumably to prevent the misuse.

To me that does seem like a reasonable speculation, though unproven.

viraptor•4h ago
Exactly because it's phrased like the poster knows this is the reason. I wouldn't downvote it if it was a clear speculation with the link to the OAI announcement you mentioned for bonus points.
xdennis•4h ago
I still find it amusing to call it "misuse". No AI company has ever asked for permission to train.
jamesblonde•5h ago
Rumour was that DeepSeek used the outputs of the thinking steps in OpenAI's reasoning model (o1 at the time) to traing DeepSeek's Large Reasoning Model R1.
orbital-decay•4h ago
More like a direct (and extremely dubious) accusation without proof from Altman. In reality those two models have as little in common as possible, and o1 reasoning chain wasn't available anyway.
msgodel•4h ago
I don't think so. They came up with a new RL algorithm that's just better.
dachworker•3h ago
Maybe they also do that, but I work with a class of problems* that no other model has managed to crack, except for R1 and that is still the case today.

Remember that DeepSeek is the offshoot of a hedge fund that was already using machine learning extensively, so they probably have troves of high quality datasets and source code repos to throw at it. Plus, they might have higher quality data for the Chinese side of the internet.

* Of course I won't detail my class of problems else my benchmark would quickly stop being useful. I'll just say that it is a task at the undergraduate level of CS, that requires quite a bit of deductive reasoning.

WiSaGaN•3h ago
Deepseek published thinking trace before OpenAI did, not after.
tw1984•2h ago
OpenAI used literally all available text owned by the entire human race to train o1/o3.

so what?

imiric•13m ago
It would be hypocritical to criticize DeepSeek if this is true, since OpenAI and all major players in this space train their models on everything they can get their hands on, with zero legal or moral concerns. Pot, meet kettle.
spaceman_2020•5h ago
Honestly, AI progress suffers because of these export restrictions. An open source model that can compete with Gemini Pro 2.5 and o3 is good for the world, and good for AI
energy123•5h ago
Your views on this question are going to differ a lot depending on the probability you assign to a conflict with China in the next five years. I feel like that number should be offered up for scrutiny before a discussion on the cost vs benefits of export controls even starts.
spaceman_2020•4h ago
I'm not American. Ever since I've been old enough to understand the world, the only country constantly at war everywhere is America. An all-powerful American AI is scarier to me than an open source Chinese one
throwaway290•4h ago
As Russian I only recently started to understand that russian government was at wars for a lot of its existence from USSR times: https://en.wikipedia.org/wiki/List_of_wars_involving_Russia#.... Many invasions and wars in places Russia should have no business in. Most of them not publicized in the country. Unlike US it was not spreading liberal values of individual freedom and against violent dictatorships, actually maybe the other way around
Thlom•3h ago
The US is not at perpetual war to spread "liberal values".
throwaway290•2h ago
I didn't say it is always the goal but if one country prevails over another country somewhere then usually it means first country's values propagate
rescbr•3h ago
> against violent dictatorships

Then look up Latin America’s history, where the US actively worked to install and support such violent dictatorships.

Some under the guise of protecting countries from the threat of communism - like Brazil, Argentina and Chile, and some explicitly to protect US company’s interests - like in Guatemala

throwaway290•2h ago
> Chile

Yes fuckups happened. But then for results Russian intervention see CCP and how many people died from their hands and policies

tazjin•4h ago
> depending on the probability you assign to a conflict with China in the next five years

And on who you would support in such a conflict! ;)

diggan•4h ago
> probability you assign to a conflict with China in the next five years. I feel like that number should be offered up for scrutiny before a discussion

Might as well talk about the probability of a conflict with South Africa, China might not be the best country to live in nor be country that takes care of its own citizens the best, but they seem non-violent towards other sovereign nations (so far), although of course there is a lot of posturing. But from the current "world powers", they seem to be the least violent.

energy123•4h ago
What is the security competition between South Africa and the US that would justify such an analogy?

China is peaceful recently, at least since their invasion of Vietnam. But (1) their post-Deng culture is highly militaristic and irredentist, (2) this is the first time in history that they actually can rollback US influence, their previous inability explains the peace rather than lack of will (3) Taiwan from a realist perspective makes too much sense, as the first in the island chain to wedge between Philippines and Japan, and its role in supplying chips to the US.

The lesson we should learn from Russia's invasion of Ukraine is to believe countries when they say they own another country. Not assume the best and design policy around that assumption.

If you want to read some experts on this question, see this: https://warontherocks.com/?s=taiwan

The general consensus seems to be around a 20-25% chance of an invasion of Taiwan within the next 5 years. The remaining debate isn't about whether they want to do it, it's about whether they'll be able to do it and what their calculation will be around those relative capabilities.

layer8•4h ago
Are you saying that large-model capabilities would make a substantial difference in a military conflict within the next five years? Because we aren’t seeing any signs of that in, say, the Ukraine war.
energy123•4h ago
GPUs are used for signals intelligence today.
Davidzheng•3h ago
small scale drones are in use in that conflict. On device AI would be a game-changer no?
layer8•3h ago
It’s not impossible, but also highly nontrivial. Apart from the actual AI implementation, power supply might be a challenge. And there is a multitude of anti-drone technology being continuously developed. Already today, an autonomous drone would have to deal with RF jamming and GPS jamming, which means it’s easily defeated unless it has the ability to navigate purely visually. Drones also tend to be limited to good weather conditions and daytime.
energy123•2h ago
In terms of countermeasures, what's the difference between having a human drone pilot and having an AI (computer vision plus control) do it over cloud? I know I'm moving the goalposts away from edge compute, but if we are discussing the relevance of GPU compute for warfare it seems relevant.
layer8•1h ago
Assuming human-level AI capabilities, not much of a difference, obviously. But I also don’t think that human operators are a bottleneck currently. Cost, failure rate, and technical limitations of drones is. If you are alluding to superhuman AI capabilities, that’s highly speculative as well with regard to what is needed for drone piloting, and also unclear how large the benefits of that would be in terms of actual operational success rate.
randomname93857•3h ago
We do see the signs and reports, you just have to look. LLMs are being adopted to warfare, with drones or otherwise, there is progress there, but it's not currently at the level of "substantial difference". And 5 years is huge time from progress perspective in this domain - just try to compare LLMs of today with LLMs of 2020.
Papazsazsa•4h ago
"Then business will have to suffer."
tw1984•2h ago
> Honestly, AI progress suffers because of these export restrictions. An open source model that can compete with Gemini Pro 2.5 and o3 is good for the world, and good for AI

DeepSeek is not a charity, they are the largest hedge fund in China, nothing different from a typical wall street funds. They don't spend billions to give the world something open and free just because it is good.

When the model is capable of generating decent amount of revenues, or when there is conclusive evidence of showing being closed would lead to much higher profit, it will be closed.

KronisLV•4h ago
I wonder how different things would be if the CPU and GPU supply chain was more distributed globally: if we were at a point where we'd have models (edit: of hardware, my bad on the wording) developed and produced in the EU, as well as other parts of the world.

Maybe then we wouldn't be beholden to Nvidia's whims (sour spot in regards to buying their cards and the costs of those, vs what Intel is trying to do with their Pro cards but inevitably worse software support, as well as import costs), or those of a particular government. I wonder if we'll ever live in such a world.

diggan•3h ago
> if we were at a point where we'd have models developed and produced in the EU, as well as other parts of the world.

But we have models developing and being produced outside of the US already, both in Asia but also Europe. Sure, it would be cool to see more from South America and Africa, but the playing field is not just in the US anymore, particularly when it comes to open weights (which seems more of a "world benefit" than closed APIs), then the US is lagging far behind.

ignoramous•2h ago
> when it comes to open weights (which seems more of a "world benefit" than closed APIs), then the US is lagging far behind.

Llama (v4 notwithstanding) and Gemma (particularly v3) aren't my idea of lagging far behind...

diggan•2h ago
> Llama (v4 notwithstanding) and Gemma (particularly v3) aren't my idea of lagging far behind...

While neat and of course Llama kicked off a large part of the ecosystem, so credit where credit is due, both of those suffer from "open-but-not-quite" as they have large documents of "Acceptable Use" which outlines what you can and cannot do with the weights, while the Chinese counter-parts slap a FOSS-compatible license on the weights and calls it a day.

We could argue if that's the best approach, or even legal considering the (probable) origin of their training data, but the end result remains the same, Chinese companies are doing FOSS releases and American companies are doing something more similar to BSL/hybrid-open releases.

It should tell you something when the legal department of one of these companies calls the model+weights "proprietary" while their marketing department continues to calling the same model+weights "open source". I know who I trust of those two to be more accurate.

I guess that's why I see American companies as being further behind, even though they do release something.

Aeolun•4h ago
So Nvidia stock is going to crash hard when the Chinese inevitably produce their own competitive chip. Though I’m baffled by the fact they don’t just license and pump out billions of AMD chips. Nvidia is ahead, but not that far ahead.

My consumer AMD card (7900 XTX) outperforms the 15x more expensive Nvidia server chip (L40S) that I was using.

Papazsazsa•4h ago
I don't know why this isn't the crux of our current geopolitical spat.

Surely it would be cheaper and easier for the CCP to develop their own chipmaking capacity than going to war in the Taiwan strait?

Davidzheng•4h ago
US will intervene militarily to stop China from taking control of TSMC if Taiwan isn't pressured by US to destroy the plants themselves, so I don't think taking Taiwan is a viable path to leading in silica only lowering US ability but given the current gap in GPUs it's not clear how helpful this is to China. So all in all I don't think China views taking Taiwan as beneficial in the AI race at all.
tw1984•2h ago
> US will intervene militarily

with a reality tv show dude being the commander in chief and a news reporter being the defense secretary.

life is tough in america, man.

sundache•3h ago
A problem they face in building their own capacity is that ASML isn't allowed to export their newest machines to China. The US has even pressured them to stop servicing some machines already in China. They've been working on getting their own ASML competitor for decades, but so far unsuccessfully.
Aeolun•3h ago
This is just a question of time. They can afford to wait, since the US is currently in the process of destroying itself.

If I were China I’d be more worried about the other up and coming world power in India.

tw1984•2h ago
> A problem they face in building their own capacity is that ASML isn't allowed to export their newest machines to China.

building their own capacity means building everything in China, that is the entire semiconductor ecosystem. just look at the mobile phones and EVs built by Chinese companies.

WJW•3h ago
China doesn't want Taiwan for the chip making plants, but because they consider its existence to be an ongoing armed rebellion against the "rightful" rulers. Getting the fabs intact would be nice, but it's not the main objective.

The USA doesn't want to lose Taiwan because of the chip making plants, and a little bit because it is beneficial to surround their geopolitical enemies with a giant ring of allies.

tw1984•2h ago
> China doesn't want Taiwan for the chip making plants, but because they consider its existence to be an ongoing armed rebellion against the "rightful" rulers.

that is what the CCP tells you and its own people.

the truth is taiwan is just the symbol of US presence in western pacific. getting taiwan back means the permanent withdrawal of US influence in the western pacific region and the offical end of US global dominance.

CCP doesn't care the island of taiwan, they care about their historical positioning.

WJW•2h ago
I think that is basically what I said already? What is ensuring historical positioning if not the righting of (perceived) old wrongs?

In any case it's clear that it is not the fabs that China cares about when it is talking about (re)conquering Taiwan.

hopelite•3h ago
China will not go to war in or over Taiwan sort of the USA doing its common narcissistic psychopathic thing of instigating, orchestrating, and agitating for aggression. It seems though that some parts of the world have started understanding how to defuse and counter the narcissistic, psychopathic abusive cabal that controls the USA and is constantly agitating for war, destruction, and world domination due to some schizophrenic messianic self-fulfilling prophecies.
b0a04gl•4h ago
no way this delay's about gpus lol. deepseek prob has r2 cooked already. r1‑0528 already pumped expectations too high. if r2 lands flat ppl start doubting.

or

who knows maybe they just chillin watching how west labs burn gpu money, let eval metas shift. then drop r2 when oai/claude trust graph dips a bit

numair•3h ago
> The Information reported on Thursday, citing two people with knowledge of the situation.

I miss the old days of journalism, when they might feel inclined to let the reader know that their source for the indirect source is almost entirely funded by the fortune generated by a man who worked slavishly to become a close friend of the boss of one of DeepSeek’s main competitors (Meta).

Feel bad for anyone who gets their news from The Information and doesn’t have this key bit of context.

Voloskaya•3h ago
I missed the old days of HN commenters when they might feel inclined to let the reader know who they are talking about without having to solve a 6 steps enigma.
jekwoooooe•35m ago
Ah well this time they can’t just illegally acquire a bunch of gpus and then just train a model from openai outputs. R1 was so overhyped
rsanek•10m ago
"wildly popular"? maybe there was alot of interest when it was released, but who even is still using R1 these days? i previously utilized it through perplexity but the o3/Gemini pro models are so much better i rarely bother to read its responses.

it's not even in the top ten based on OpenRouter https://openrouter.ai/rankings?view=month