While this NVIDIA system is inferior from the point of view of the memory capacity, its main advantage is that the top models will have a bigger GPU, i.e. with 6144 or 5120 FP32 execution units, compared to 2560 for the AMD GPU (compared to the NVIDIA CPU, the AMD CPU has a better multi-threaded performance for legacy programs, and a much better multi-threaded performance for the applications that use AVX-512).
However, these top models with big GPUs will also be much more expensive than the competing AMD system, while also being much more expensive than a laptop or mini-PC with an equivalent discrete NVIDIA GPU (which has the disadvantage of having direct access only to a much smaller, even if faster, memory).
I'd say this relates directly to the cost of running AI models remotely.
And we won't know what the actual cost will be until AI vendors recover the huge pile of cash they've dumped into development (plus interest).
But most businesses don't really care about most of the apple --- they only need their special bite out of it.
For example, doctors mainly care about medicine. Nvidia is attempting to provide the hardware needed for local, specialized models.
But I don’t know about specialised: this could run quite large models with MoE.
The hardware for 50 tokens per second with a four bit quantisation of Gemma 4 26B or the sparse Qwen 3.6 is not really that expensive: it’s a secondhand M1 Max.
Beyond that, I agree. I think moving planning tasks to local is a now thing, not that it really has much impact on token spend. I also think many small coding tasks are fully within the grasp of the above two models.
The main issue right now is that the software landscape is rather confusing, but I reckon uncomplicated Gemma 4 26B QAT support with MTP is a few weeks away.
And maybe for NVIDIA and MS it is also about them quietly betting that local models are, in fact, going to be good enough for most tasks pretty soon.
Decent single core (a long ways from Apple level, but decent), but it makes up for it in cores to provide M5 level performance, CPU wise. Memory bandwidth it is kind of starved, at 1/6th many GPUs.
They got Microsoft to customize Windows for the RTX Spark, and will likely have to brutally throttle it when running as a laptop (it's literally a 140W TDP chip), and that's neat. It's going to be a very expensive laptop.
DGX Spark has a maximum of 273 GB/s bandwidth in ideal scenarios (hard to reach)
That puts it between an M5 (153) and M5 Pro (307)
IIRC that's due to maintain BIOS and Windows (+games & apps) backwards compatibility, but memory access speeds are the same.
Some software assumes pre-defined set-aside pools of memory reserved for video purposes, but the chip does actually have access to the whole pool.
That's an API issue not a hardware issue. Regardless, I believe the major APIs permit seamlessly sharing pointers at this point? (I have no experience doing that though.)
Bill Gates had a quote some years ago...
People have still not learned how fast we improve our tech and how much cheaper thing gets I guess :)
I don't know who will be the winner but with some of the recent releases from gemma it seems more probable that you may run some models locally if only from a cost perspective, not even considering business security. Not sure how this type of architecture would make for good gaming though, puts into question the whole statement.
"Ranked in the top 2% of scientists globally (Stanford/Elsevier 2025) and among GitHub's top 1000 developers" - side note but this guy puts this everywhere, gives me probably the inverse of what he is marketing for.
Lol yeah seriously, that stinks "I ask AI to generate a huge amount of bullshit and upload it to pad irrelevant stats".
Absolute loser.
Being the top x% is what OnlyFans girls brag about, professor...
And it's not exactly brain surgery, is it? https://www.youtube.com/watch?v=THNPmhBl-8I
Citation needed
Now that Intel is historically weak, Nvidia is attempting to reverse the situation.
Tech companies have strangled their own market.
nvidias master plan may be making it the new normal to have "only" 400GB/s bandwidth, thus gatekeeping local model usage further behind "more memory but not as fast as the cloud can do it"
Nvidia just wants to sell stuff to everyone.
And I think for professionals doing local AI work, products like Strix Halo and Apple Silicon are a competitive threat.
A big part of maintaining the leading software ecosystem is ensuring you have competitive hardware for all your users.
I also think the RTX Spark product is relatively low effort for Nvidia. Grab a Mediatek CPU and slap an Nvidia GPU on the die. Sure, that’s oversimplifying it, but still.
https://nvidianews.nvidia.com/news/nvidia-microsoft-windows-...
I have been somewhat surprised at the lack of commentators observing that this is Microsoft and above all NVIDIA launching a device that is fundamentally at odds with the metered cloud model of AI.
When you look at the other announcements and murmurings (better offline BYOK for Copilot, talk of an unmetered AI future) I think it’s clear that these two firms understand that cloud-only AI is not sustainable or inherently in their interests. But their willingness to undermine OpenAI with a product like this is notable.
A powerful new chapter for Windows PCs, accelerated by Nvidia RTX Spark
https://news.ycombinator.com/item?id=48352693
Nvidia RTX Spark
It's an interesting "newcomer" and the more the better but calling this a "beast" and a "game changer" is ridiculous to say the least.
Then there is the price..
Running local models will stay niche for a while, unless we see breakthroughs
Most doctors don't care much about engineering or accounting or software development or 10000 other things that big vendor models address.
This area is yet to be really explored. Nvidia aims to provide the hardware to do so.
As a side note, qualcomm chip set on Android has been doing this for years (like Apple) so it's not super unique thing. It's more like there was no need before.
[1] https://www.jeffgeerling.com/blog/2025/increasing-vram-alloc...
This isn't the first time we have UMA on the PC, btw. When SGI did their PC workstations, their 320 and 540 PC workstations had what they called Cobalt graphics chipset and crossbar with their IVC architecture. They bypassed AGP at the time completely. It was quite unique to see strict UMA on a PC. Haven't seen it since until these new systems we're seeing now on PCs and Mac.
I have a hard time believing running a model on a laptop will be cheaper than running it in a datacenter. Why wouldn't economies of scale apply here as with every other computation?
The vision NVIDIA is selling is pure marketing IMHO
Not everything I want to use an LLM for requires "PhD level intelligence", and increasingly I'm finding more uses that involve sharing my personal data.
Yesterday my local model helped me when looking for a doctor who is in-network for my insurance. I threw it a screenshot from the providers search results and it looked up reviews for all of them.
This is the 2026 edition of Ken Olsen: "There is no reason anyone would want a computer in their home"
cyberziko•1h ago
crims0n•51m ago
dgellow•21m ago