Architecturally, the DGX Spark has a far better cache setup to feed the GPU, and offers NVLINK support.
But yeah, this should have been further up.
Further down, in the exploded view it says "Blackwell GPU 1PetaFLOP FP4 AI Compute"
Then further down in the spec chart they get less specific again with "Tensor Performance^1 1 PFLOP" and "^1" says "1 Theoretical FP4 TOPS using the sparsity feature."
Also, if you click "Reserve Now" the second line below that redundant "Reserve Now" button says "1 PFLOPS of FP4 AI performance"
I mean I'll give you that they could be more clear and that it's not cool to just hype up on FP4 performance, but they aren't exactly hiding the context like they did during GTC. I wouldn't call this "disingenuous"
I think lots of children are going to be very disappointed running their blas benchmarks on Christmas morning and seeing barely tens of teraflops.
(For reference see how the still optimistic numbers are for the H200 when you use realistic datatypes.
https://nvdam.widen.net/s/nb5zzzsjdf/hpc-datasheet-sc23-h200... )
Yeah, it’s miles better than WiFi. But if there was something I’d think maybe benefit from Thunderbolt this would’ve been it.
The ability to transfer large models or datasets that way just seems like it would be much faster and a real win for some customers.
Why would you ever want a DGX Spark to talk to a “normal PC” at 40+ Gbps speeds anyways? The normal PC has nothing that interesting to share with it.
But, yes, the DGX Spark does have four USB4 ports which support 40Gbps each, the same as Thunderbolt 4. I still don’t see any use case for connecting one of those to a normal PC.
ASUS Ascent GX10 - 1TB $2,999
MSI EdgeXpert MS-C931 - 4TB $3,999
the 1TB/4TB seems to be the size of the included NVMe SSD.the reserve now page also lists
NVIDIA DGX Spark Bundle
2 NVIDIA DGX Spark Units - 4TB with Connecting Cable $8,049
The DGX Spark specs lists an NVIDIA ConnectX-7 Smart NIC which is rated at 200Gbe to connect to another DGX Spark, for about double the amount of memory for models.Their prompt processing speeds are absolutely abysmal: if you're trying to tinker from time to time, a GPU like a 5090 or renting GPUs is a much better option.
If you're just trying to prep for impending mainstream AI applications, few will be targeting this form factor: it's both too strong compared to mainstream hardware, and way too weak compared to dedicated AI-focused accelerators.
-
I'll admit I'm taking a less nuanced take than some would prefer, but I'm also trying to be direct: this is not ever going to be a better option than a 5090.
Their prompt processing speeds are absolutely abysmal
They are not. This is Blackwell with Tensor cores. Bandwidth is the problem here.I've run inference workloads on a GH200 which is an entire H100 attached to an ARM processor and the moment offloading is involved speeds tank to Mac Mini-like speeds, which is similarly mostly a toy when it comes to AI.
Not entirely sure how your ARM statement matters here. This is unified memory.
What do you think attached means here? It's unified memory.
I'm telling you empirically, at any none single batch size, an H100 with unified memory prompt processing dips to double digits despite having significantly more compute and bandwidth, the moment a model is large enough to require offloading: aka require dipping into the unified portion of it's memory.
Even at bs=1 the performance is pretty absymal, I don't have the benchmarks handy anymore but not even 50% of the performance before offloading.
You can't debate me on this, this is the reality. That is why you can rent a GH200 for the same price of as an H100 right now, if not cheaper: nobody wants them.
-
"NoT EntIrelY SuRe"... sometimes I forget how absolutely exhausting it is trying to speak to a jackass who thinks they know so much better than you that they can't even begin to try to process what you're saying before replying.
And there's something about the HN-fake-niceness that's attached to it that really gets my fucking goat.
The performance with offloading was just so bad I didn't even bother proceeding to the benchmark (without offloading you get typical H100 speeds)
The limiting factor is going to be the VRAM on the 5090, but nvidia intentionally makes trying to break the 32GB barrier extremely painful - they want companies to buy their $20,000 GPUs to run inference for larger models.
Then the RTX Pro 6000 for running a little bit larger models (96gb VRAM, but only ~15-20% more perf than 5090).
Some suggest Apple Silicon only for running larger models on a budget because of the unified memory, but the performance won't compare.
?? this seems more than a little disingenuous...
ASUS and NVIDIA told us that their GB10 platforms are expected to use up to 170W.
[edit] the PSU is 240W so that'd place an upper limit on power draw, unless they upgrade it.These seem to be highly experimental boards, even though are super powerful for their form factor.
Ryzen AI Max 395+, ~120 tops (fp8?), 128GB RAM, $1999
Nvidia DGX Spark, ~1000 tops fp4, 128GB RAM, $3999
Mac Studio max spec, ~120 tflops (fp16?), 512GB RAM, 3x bandwidth, $9499
DGX Spark appears to potentially offer the most token per second, but less useful/value as everyday pc.
Mac Studio max spec, ~120 tflops (fp16?), 384GB RAM, 3x bandwidth, $9499
512GB.DGX has 256GB/s bandwidth so it wouldn't offer the most tokens/s.
Using an M3 Ultra I think the performance is pretty remarkable for inference and concerns about prompt processing being slow in particular are greatly exaggerated.
Maybe the advantage of the DGX Spark will be for training or fine tuning.
Also notably, Strix Halo and DGX Spark are both ~275GBps memory bandwidth. Not always but in many machine learning cases it feels like that's going to be the limiting factor.
Just got my Framework PC last week. It's easy to setup to run LLMs locally - you have to use Fedora 42, though, because it has the latest drivers. It was super easy to get qwen3-coder-30b (8 bit quant) running in LMStudio at 36 tok/sec.
Strix Halo has the same and I agree it’s overrated.
Nvidia DGX: 273 GB/s
M4 Max: (up to) 546 GB/s
M3 Ultra: 819 GB/s
RTX 5090: ~1.8 TB/s
RTX PRO 6000 Blackwell: ~1.8 TB/s
5090: 3352 | 1999 | 0.60
Thor: 2070 | 3499 | 1.69
Spark: 1000 | 3999 | 4.00
____________
FP8-dense (TFLOPS) | Price | $/TF8d (4090s have no FP4)
4090 : 661 | 1599 | 2.42
4090 Laptop: 343 | vary | -
____________
Geekbench 6 (compute score) | Price | $/100k
4090: 317800 | 1599 | 503
5090: 387800 | 1999 | 516
M4 Max: 180700 | 1999 | 1106
M3 Ultra: 259700 | 3999 | 1540
____________
Apple NPU TOPS (not GPU-comparable)
M4 Max: 38
M3 Ultra: 36
In fact you're also doing the work Nvidia should have done when they put together their (imho) ridiculously imprecise spec sheet.
There's two models that go by 6000, the RTX Pro 6000 (Blackwell) is the one that's currently relevant.
4090: 24GB RAM
Thor & Spark: 128GB RAM (probably at least 96GB usable by the GPU if they behave similar to the AMD Strix Halo APU)
Even if you were to say memory bandwidth was the problem, there is no consumer grade GPU that can run any SoTA LLM, no matter what you'd have to settle for a more mediocre model.
Outside of LLMs, 256 GB/s is not as much of an issue and many people have dealt with less bandwidth for real world use cases.
$3,999
I'd rather just get an M3 Ultra. Have an M2 Ultra on the desk, and an M3 Ultra sitting on the desk waiting to be opened. Might need to sell it and shell out the cash for the max ram option. Pricey, but seems worthwhile.
Fits into 32gb: 5090
Fits into 64gb - 96gb: Mac Studio
Fits into 128gb: for now 395+ $/token/s,
Mac Studio if you don't care about $
but don't have unlimited money for Hxxx
This could be great for models that fit 128gb and you want best $/token/s (if it is faster than a 395+).
oracel•5h ago
brookst•5h ago
mynegation•5h ago
bigyabai•5h ago