Nvidia DGX Spark: When Benchmark Numbers Meet Production Reality

https://publish.obsidian.md/aixplore/Practical+Applications/dgx-lab-benchmarks-vs-reality-day-4

57•RyeCatcher•2h ago

Comments

RyeCatcher•1h ago

Would love to hear from others using the spark for model training and development.

stuckinhell•59m ago

I'm utterly shocked at the article saying GPU inference (PyTorch/Transformers)isn't working. Numerical instability produces bad outputs, Not viable for real-time serving, Wait for driver/CUDA updates!

My job just got me and our entire team a DGX spark. I'm impressed at the ease of use for ollama models I couldn't run on my laptop. gpt-oss:120b is shockingly better than what I thought it would be from running the 20b model on my laptop.

The DGX has changed my mind about the future being small specialized models.

jasonjmcghee•55m ago

You're shocked because that isn't your experience? From the article it sounds like ollama runs cpu inference not GPU inference. Is that the case for you?

RyeCatcher•43m ago

Totally agree. I’ve been training nanochat models all morning. Hit some speed bumps. I’ll share more later in another article. Buts it’s absolutely amazing. I fine tuned a Gemma3 model in a day yesterday.

jsheard•50m ago

No mention of the monstrous 200GbE NIC, seems like a waste if people aren't finding a use for it.

RyeCatcher•42m ago

Need to buy 2 and connect em. :-)

RyeCatcher•49m ago

I absolutely love it. I’ve been up for days playing with it. But there are some bleeding edge issues. I tried to write a balanced article. I would highly recommend for people that love to get their hands dirty. Blows away any consumer GPU.

furyofantares•44m ago

Since the text is obviously LLM output, how much prompting and editing went into this post? Did you have to correct anything that you put into it that it then got wrong or added incorrect output to?

enum•17m ago

I have H100s to myself, and access to more GPUs than I know what to do with in national clusters.

The Spark is much more fun. And I’m more productive. With two of them, you can debug shallow NCCL/MPI problems before hitting a real cluster. I sincerely love Slurm, but nothing like a personal computer.

veber-alex•39m ago

The llama.cpp issues are strange.

There are official benchmarks of the Spark running multiple models just fine on llama.cpp

https://github.com/ggml-org/llama.cpp/discussions/16578

RyeCatcher•37m ago

Cool I’ll have a look. All reflections I made were first pass stuff.

CaptainOfCoit•24m ago

There wasn't any instructions how the author got ollama/llama.cpp, could possibly be something nvidia shipped with the DGX Spark and is an old version?

eadwu•31m ago

There are bleeding edge issues, everyone dials into transformers so that's generally pain proof.

I haven't exactly bisected the issue but I'm pretty sure convolutions are broken on sm_121 after a certain size, getting 20x memory blowup from a convolution from a 2x batch size increase _only_ on the DGX Spark.

I haven't had any problems with inference, but I also don't use the transformers library that much.

llama.cpp was working for openai-oss last time I checked and on release, not sure if something broke along the way.

I don't exactly know if memory fragmentation is something fixable on the driver side - this might just be the problem with kernel's policy and GPL, it prevents them from automatically interfering with the memory subsystem to the granularity they'd like - see zfs and their page table antics - or so my thoughts on it is.

If you've done stuff on WSL, you have similar issues and you can fix it by running a service that normally compacts and clean memory, I have it run every hour. Note that this does impact at the very least CPU performance and memory allocation speeds, but I have not have any issue with long training runs with it (24hr+, assuming that is the issue, I have never tried without it and put that service in place since getting it due to my experience on WSL).

Using Atomic State to Improve React Performance in Deeply Nested Component Trees

Seeking Work: Greater Bay Area Remote: Yes (US-Based)

Alpha launch – .well-known/avatar – feedback wanted

Show HN: A Minimal Playwright Skill for Claude Code

Looking for a Winter/Spring 2026 SWE Internships

Collective Communication for 100k+ GPUs

Show HN: AI Models Group Chat

Chinese and U.S. Officials Reach Framework of a Trade Deal

The MP3.com Rescue Barge Barge

Trump and Xi will 'consummate' TikTok deal on Thursday, treasury secretary says

Mechanize AI: Life after work

The mysterious figure accused of masterminding a $14B crypto scam

Climbing Gyms Took over the World

Kebabs Are Consequential

H.P. Lovecraft: The King of Weird (1996)

Relational Charades: Turning Movies into Tables

Practical Defenses Against Technofascism

Valkey 9.0 Released with Ability to Achieve One Billion Requests / Second

The Ethics in Our Algorithms: When Code Contradicts Conduct

Ask HN: Amazon kindle can't update daylight saving time

Can a new blood test detect ME/CFS? An expert unpacks new research

EPYC Turin vs. Xeon 6 Granite Rapids vs. Graviton4 AWS M8 Instance Benchmarks

Solarized – A Break Down

The 1920s Immigration Mistake America May Repeat

NeuroMark – Yet another bookmark organizer for Firefox

How to Use Zorn's Lemma

How indexes make your queries fast

Show HN: Typegraph – type-level graphs of Rust types

Nanoimprint Lithography: Stop Saying It Will Replace EUV

Fintech will hire you if you're a bad writer