Nvidia is proposing a beast of a CPU system for Windows PCs

https://twitter.com/lemire/status/2062880075117113739

33•tosh•1h ago

Comments

cyberziko•1h ago

good to know, hope the price will be affordable, having a pc becoming a luxury :)

crims0n•51m ago

Certainly not in the year of our lord, 2026. Maybe in a few years though.

dgellow•21m ago

I’m not sure if you’re aware but there is a supply chain shortage for pretty much everything needed for a PC that isn’t expected to be solved this year or next year. There is no way that can be affordable

YasuoTanaka•1h ago

128GB of unified memory is a dream come true for local LLMs. VRAM has been the ultimate bottleneck for developers.

avocadoking•51m ago

It could help with exploding external LLM costs. Interesting to see how the adaption will be, which will mainly depend on the price.

adrian_b•43m ago

The competitor for this NVIDIA CPU will not be the now old AMD Strix Halo, but its successor (launched recently), which supports up to 192 GB of unified memory. Thus 128 GB is no longer SOTA.

While this NVIDIA system is inferior from the point of view of the memory capacity, its main advantage is that the top models will have a bigger GPU, i.e. with 6144 or 5120 FP32 execution units, compared to 2560 for the AMD GPU (compared to the NVIDIA CPU, the AMD CPU has a better multi-threaded performance for legacy programs, and a much better multi-threaded performance for the applications that use AVX-512).

However, these top models with big GPUs will also be much more expensive than the competing AMD system, while also being much more expensive than a laptop or mini-PC with an equivalent discrete NVIDIA GPU (which has the disadvantage of having direct access only to a much smaller, even if faster, memory).

christkv•24m ago

I don’t think there is much improvement in compute for the new strix halo revision. The next one supposedly adds rdna4 cores or similar and more memory channels

zamadatix•43m ago

I have a 128 GB LPDDR5X machine. It's a great workstation laptop (which is why I got it) but the memory bandwidth is just awful if you're wanting to use it for AI. An old Epyc COU will fair better both in terms of being able to run full sized larger models as well as having higher memory bandwidth, and that's not a recommendation to go that route either as it's still not worth it.

jqpabc123•1h ago

I am not sure how many people will run AI models locally. It still seems like a niche application to me.

I'd say this relates directly to the cost of running AI models remotely.

And we won't know what the actual cost will be until AI vendors recover the huge pile of cash they've dumped into development (plus interest).

chpatrick•48m ago

I think it's niche now because getting the hardware to run it is expensive and the quantized models don't work as well. If those improve then it would be a no brainer to pay one off for the hardware instead of a fortune for API calls.

jqpabc123•31m ago

AI vendors are attempting to offer the whole apple. And they are spending huge sums of money in the process.

But most businesses don't really care about most of the apple --- they only need their special bite out of it.

For example, doctors mainly care about medicine. Nvidia is attempting to provide the hardware needed for local, specialized models.

dofm•9m ago

I think it is likely to appeal to video and photo editors who want to use AI tools (the press release has a quote from Blackmagic Design, as well as from Adobe, who I think have no stomach for their own cloud AI).

But I don’t know about specialised: this could run quite large models with MoE.

dofm•21m ago

I am not really convinced that four bit quantisation is that bad; almost certainly six will be enough. But Google are making claims for their QAT tech in Gemma that they are surely using or testing in Gemini that it preserves nearly source model quality while reducing footprint.

The hardware for 50 tokens per second with a four bit quantisation of Gemma 4 26B or the sparse Qwen 3.6 is not really that expensive: it’s a secondhand M1 Max.

Beyond that, I agree. I think moving planning tasks to local is a now thing, not that it really has much impact on token spend. I also think many small coding tasks are fully within the grasp of the above two models.

The main issue right now is that the software landscape is rather confusing, but I reckon uncomplicated Gemma 4 26B QAT support with MTP is a few weeks away.

tosh•53m ago

nb: poster is Daniel Lemire (https://lemire.me), who is very skilled in getting performance out of compute hardware (e.g. via simd, cache usage etc)

infecto•46m ago

As he likes to share often, "He ranks among the top 2% of scientists globally (Stanford/Elsevier 2025) and is one of GitHub's top 1000 most followed developers. "

tosh•41m ago

based on citations and github stars? or what's the context there?

infecto•12m ago

I was adding further citation based on his own claims. Not sure what context is missing.

2OEH8eoCRo0•52m ago

Are their enterprise orders slowing down? Why use precious maxed out fab capacity on consumer stuff when it could be an enterprise chip?

zamadatix•44m ago

It uses LPDDR5X instead of VRAM and will still sell for a premium while pushing their presence even further in every side of the AI market. This was one area AMD was ahead in and now Nvidia is probably better off making this to compete on that front while still being better off than making a 5090.

fc417fc802•33m ago

That doesn't answer the question. If the high margin enterprise GPUs are saturating the fab capacity you wouldn't expect them to be pushing this. But IIRC those all have oodles of integrated HBM at this point so I wonder if fab capacity for that has become a bottleneck.

dofm•34m ago

It already is an enterprise chip. This is about Microsoft not having the equivalent of an M3 Max or whatever laptop.

And maybe for NVIDIA and MS it is also about them quietly betting that local models are, in fact, going to be good enough for most tasks pretty soon.

llm_nerd•52m ago

Does this person know that this is the same GB chip in the DGX Spark? It isn't some proposed thing, it's a chip loads of people have on their desk right now, and there are endless benchmarks of it.

Decent single core (a long ways from Apple level, but decent), but it makes up for it in cores to provide M5 level performance, CPU wise. Memory bandwidth it is kind of starved, at 1/6th many GPUs.

They got Microsoft to customize Windows for the RTX Spark, and will likely have to brutally throttle it when running as a laptop (it's literally a 140W TDP chip), and that's neat. It's going to be a very expensive laptop.

Apreche•49m ago

I heard the memory bandwidth is not just slower than on a GPU, as expected, but is significantly slower than Apple’s unified memory.

MrBuddyCasino•42m ago

CPU/GPU is decent (800 GB or so), memory is slowish (300GB or so). Some Apple M are slower, some are faster.

dagmx•10m ago

Where did you get those numbers from?

DGX Spark has a maximum of 273 GB/s bandwidth in ideal scenarios (hard to reach)

That puts it between an M5 (153) and M5 Pro (307)

MrBuddyCasino•44m ago

Plus John Carmack has reviewed it, he was not amazed.

SwtCyber

seanalltogether•49m ago

Is it really unified memory? AMD Strix Halo is "unified" but you still have to allocate memory separately for cpu vs gpu. Apple Silicon is true unified memory.

joe_mamba•44m ago

>AMD Strix Halo is "unified" but you still have to allocate memory separately for cpu vs gpu.

IIRC that's due to maintain BIOS and Windows (+games & apps) backwards compatibility, but memory access speeds are the same.

ankurdhama•43m ago

It is unified in the sense that the OS can dynamically assign memory to CPU and GPU. Apple silicon is not a alien tech that other silicon vendors cannot implement.

ApatheticCosmos•42m ago

Strix halo is unified memory. The memory allocation set in BIOS is overridden by the operating system if it has the capability.

eigenspace•41m ago

That's a software question, not a hardware question.

Some software assumes pre-defined set-aside pools of memory reserved for video purposes, but the chip does actually have access to the whole pool.

fc417fc802•38m ago

> you still have to allocate memory separately for cpu vs gpu

That's an API issue not a hardware issue. Regardless, I believe the major APIs permit seamlessly sharing pointers at this point? (I have no experience doing that though.)

sisve•48m ago

> I am not sure how many people will run AI models locally. It still seems like a niche application to me.

Bill Gates had a quote some years ago...

People have still not learned how fast we improve our tech and how much cheaper thing gets I guess :)

chaostheory•41m ago

We had a thing called globalism that drastically reduced costs. Globalism right now is on life support. Given geopolitics, I don’t see how it’s going to survive.

dgellow•34m ago

Memory isn’t getting cheap soon, and you need a lot of it for local models

sisve•27m ago

All depends. The current technology will be cheaper in a year or two. The best cutting edge stuff will properly be even more expensive. But in 10 years time... we can run current SOTA models (or models that are equally good ) on our local hardware

dgellow•16m ago

Ah yes, if you count in decades, for sure I expect to run them locally

infecto•47m ago

"I am not sure how many people will run AI models locally. It still seems like a niche application to me. However, it will make decent machines to play video games."

I don't know who will be the winner but with some of the recent releases from gemma it seems more probable that you may run some models locally if only from a cost perspective, not even considering business security. Not sure how this type of architecture would make for good gaming though, puts into question the whole statement.

"Ranked in the top 2% of scientists globally (Stanford/Elsevier 2025) and among GitHub's top 1000 developers" - side note but this guy puts this everywhere, gives me probably the inverse of what he is marketing for.

iLoveOncall•41m ago

> "Ranked in the top 2% of scientists globally (Stanford/Elsevier 2025) and among GitHub's top 1000 developers" - side note but this guy puts this everywhere, gives me probably the inverse of what he is marketing for.

Lol yeah seriously, that stinks "I ask AI to generate a huge amount of bullshit and upload it to pad irrelevant stats".

Absolute loser.

netsharc•31m ago

I found his website, https://www.lemire.me/en/ , and the "2%" brag is the very first sentence, geez.

Being the top x% is what OnlyFans girls brag about, professor...

And it's not exactly brain surgery, is it? https://www.youtube.com/watch?v=THNPmhBl-8I

Zetaphor•13m ago

> Daniel Lemire’s blog is one of the top 50 most popular blogs on Hacker News, the standard tech news aggregation site.

Citation needed

alberth•44m ago

Is this essentially an Apple M-Series chip in concept?

BoredPositron•43m ago

Mediatek and Nvidia the horsemen of abandoning hardware after a year. The Jetson family still left a bad taste in my mouth.

SwtCyber•40m ago

The interesting part to me isn't really the Cortex-X925 vs AVX-512 comparison, but Nvidia trying to make the GPU the center of a Windows PC rather than an add-in card

cwzwarich•31m ago

A large part of Intel's success over decades was to capture as much of the value from the PC for themselves. This previously caused a confrontation between the two in 2009 when Intel integrated the memory controller into the CPU and argued that Nvidia's licensing agreement did not allow them to produce chipsets for such CPUs. Nvidia was developing an x86 CPU based on licensed technology from Transmeta, but after the legal battle with Intel they pivoted to producing an ARM CPU (released as Denver) based on this technology instead.

Now that Intel is historically weak, Nvidia is attempting to reverse the situation.

cryo32•38m ago

Yeah when laptops are shipping 8Gb and Microsoft is suddenly interested in native apps, nope.

Tech companies have strangled their own market.

AmazingTurtle•37m ago

while unified memory may offer better performance than unsoldered DDR system memory, it still won't be as great as 1.8TB/s bandwidth on high end consumer GPUs right now.

nvidias master plan may be making it the new normal to have "only" 400GB/s bandwidth, thus gatekeeping local model usage further behind "more memory but not as fast as the cloud can do it"

dangus•11m ago

I think it’s an interesting theory but a bit too conspiracy theory-ish.

Nvidia just wants to sell stuff to everyone.

And I think for professionals doing local AI work, products like Strix Halo and Apple Silicon are a competitive threat.

A big part of maintaining the leading software ecosystem is ensuring you have competitive hardware for all your users.

I also think the RTX Spark product is relatively low effort for Nvidia. Grab a Mediatek CPU and slap an Nvidia GPU on the die. Sure, that’s oversimplifying it, but still.

dofm•37m ago

Here is the press release for the actual machine:

https://nvidianews.nvidia.com/news/nvidia-microsoft-windows-...

I have been somewhat surprised at the lack of commentators observing that this is Microsoft and above all NVIDIA launching a device that is fundamentally at odds with the metered cloud model of AI.

When you look at the other announcements and murmurings (better offline BYOK for Copilot, talk of an unmetered AI future) I think it’s clear that these two firms understand that cloud-only AI is not sustainable or inherently in their interests. But their willingness to undermine OpenAI with a product like this is notable.

tantalor•35m ago

Maybe. Or they are simply hedging their bets.

Waterluvian•36m ago

It’s an opportunity for them to start doing away with the whole ATX thing where owners had freedom to mix and match at their own pleasure.

thrance•32m ago

Will it support Linux?

ChrisArchitect•26m ago

A powerful new chapter for Windows PCs, accelerated by Nvidia RTX Spark

https://news.ycombinator.com/item?id=48352693

Nvidia RTX Spark

https://news.ycombinator.com/item?id=48352939

PedroBatista•23m ago

Don't want to be too harsh, maybe I'm missing something, but the CPU is at least 2 years old, internally it has been a complete shitshow and that's a minor hiccup when compared to the firmware and software situation.

It's an interesting "newcomer" and the more the better but calling this a "beast" and a "game changer" is ridiculous to say the least.

Then there is the price..

Demystifying phone unlocking tools: A technical overview

"Sad to see Ted Chiang resorting to such bad arguments in this piece."

Using Clause for Moodle content creation

The Largest Floating Dry Dock Was Towed Across the Atlantic to Bermuda in 1869

Trackr Bar – a macOS menu bar app for tracking AI usage and costs

Average cost of living, anywhere on Earth

Arc v0.0.2-alpha – Release Notes

Are you there Grok?: AI as a centralizing technology

Running Python code in a sandbox with MicroPython and WASM

The Hardware Behind AI

Roblox Released the Biggest AI World Model in Gaming. Everyone Hates It

Multi-Robot Cooperative Spatial Reasoning with Multimodal Large Language Models

Revenge of the AI Bubble

If LLMs Have Human-Like Attributes, Then So Does Age of Empires II

Auburn college student missing in Japan argued with mom over ChatGPT usage

Benchmarks in Leipzig

Arc Fusion Power Plant Physics Basis

Ask HN: Were CS profs right to look down on programming in light of modern AI?

The First SMS Message

CreatorL.ink Now Live

Fooling Go's X.509 Certificate Verification

Better Prompting LLMs Through Analogies

Show HN: Facebook cover photo resizer that shows the mobile crop before upload

Smack – AI personas that run UX tests on any URL local

FokosDB: Strongly consistent storage DB ontop of Cloudflare Durable Objects

I built a black-and-white e-ink display to stop checking my phone 60 times a day

US House lawmakers release draft bill to prohibit state AI rules

Instead of Taking Your Job, A.I. Might Transform It

SETI Panel Revises Recommendations for Dealing with 'Disclosure Day'

Open Loops – a tiny tool to track what you're waiting on from people