Nvidia DGX Spark: great hardware, early days for the ecosystem

https://simonwillison.net/2025/Oct/14/nvidia-dgx-spark/

79•GavinAnderegg•5h ago

Comments

ChrisArchitect•3h ago

More discussion: https://news.ycombinator.com/item?id=45575127

ur-whale•2h ago

As is usual for NVidia: great hardware, an effing nightmare figuring out how to setup the pile of crap they call software.

p_l•2h ago

And yet CUDA has looked way better than ATi/AMD offerings in the same area despite ATi/AMD technically being first to deliver GPGPU (major difference is that CUDA arrived year later but supported everything from G80 up, and nicely evolved, while AMD managed to have multiple platforms with patchy support and total rewrites in between)

cylemons•18m ago

What was the AMD GPGPU called?

kanwisher•2h ago

If you think their software is bad try using any other vendor , makes nvidia looks amazing. Apple is only one close

enoch2090•1h ago

Although a bit off the GPU topic, I think Apple's Rosetta is the smoothest binary transition I've ever used.

jasonjmcghee•1h ago

Except the performance people are seeing is way below expectations. It seems to be slower than an M4. Which kind of defeats the purpose. It was advertised as 1 Petaflop on your desk.

But maybe this will change? Software issues somehow?

It also runs CUDA, which is useful

airstrike•1h ago

it fits bigger models and you can stack them.

plus apparently some of the early benchmarks were made with ollama and should be disregarded

pjmlp•49m ago

Try to use Intel or AMD stuff instead.

simonw•2h ago

It's notable how much easier it is to get things working now that the embargo has lifted and other projects have shared their integrations.

I'm running VLLM on it now and it was as simple as:

  docker run --gpus all -it --rm \
    --ipc=host --ulimit memlock=-1 \
    --ulimit stack=67108864 \
    nvcr.io/nvidia/vllm:25.09-py3

(That recipe from https://catalog.ngc.nvidia.com/orgs/nvidia/containers/vllm?v... )

And then in the Docker container:

  vllm serve &
  vllm chat

The default model it loads is Qwen/Qwen3-0.6B, which is tiny and fast to load.

matt3210•1h ago

> even in a Docker container

I should be allowed to do stupid things when I want. Give me an override!

simonw•1h ago

A couple of people have since tipped me off that this works around that:

  IS_SANDBOX=0 claude --dangerously-skip-permissions

You can run that as root and Claude won't complain.

fnordpiglet•1h ago

This seems to be missing the obligatory pelican on a bicycle.

simonw•1h ago

Here's one I made with it - I didn't include it in the blog post because I had so many experiments running that I lost track of which model I'd used to create it! https://tools.simonwillison.net/svg-render#%3Csvg%20width%3D...

fnordpiglet•40m ago

That seat post looks fairly unpleasant.

two_handfuls•1h ago

I wonder how this compares financially with renting something on the cloud.

fisian•1h ago

The reported 119GB vs. 128GB according to spec is because 128GB (1e9 bytes) equals 119GiB (2^30 bytes).

simonw•1h ago

Ugh, that one gets me every time!

wmf•1h ago

That can't be right because RAM has always been reported in binary units. Only storage and networking use lame decimal units.

simonw•43m ago

Looks like Claude reported it based on this:

  ● Bash(free -h)
    ⎿                 total        used        free      shared  buff/cache   available
       Mem:           119Gi       7.5Gi       100Gi        17Mi        12Gi       112Gi
       Swap:             0B          0B          0B

That 119Gi is indeed gibibytes, and 119Gi in GB is 128GB.

rgovostes•55m ago

I'm hopeful this makes Nvidia take aarch64 seriously for Jetson development. For the past several years Mac-based developers have had to run the flashing tools in unsupported ways, in virtual machines with strange QEMU options.

reenorap•35m ago

Is 128 GB of unified memory enough? I've found that the smaller models are great as a toy but useless for anything realistic. Will 128 GB hold any model that you can do actual work with or query for answers that returns useful information?

simonw•28m ago

There are several 70B+ models that are genuinely useful these days.

I'm looking forward to GLM 4.6 Air - I expect that one should be pretty excellent, based on experiments with a quantized version of its predecessor on my Mac. https://simonwillison.net/2025/Jul/29/space-invaders/

monster_truck•32m ago

Whole thing feels like a paper launch being held up by people looking for blog traffic missing the point.

I'd be pissed if I paid this much for hardware and the performance was this lacklustre while also being kneecapped for training

FSF announces Librephone project

I am a programmer, not a rubber-stamp that approves Copilot generated code

Pixnapping Attack

A modern approach to preventing CSRF in Go

Beliefs that are true for regular software but false when applied to AI

Nvidia DGX Spark: great hardware, early days for the ecosystem

Interviewing Intel's Chief Architect of x86 Cores

How bad can a $2.97 ADC be?

Can we know whether a profiler is accurate?

DOJ seizes $15B in Bitcoin from 'pig butchering' scam based in Cambodia

Unpacking Cloudflare Workers CPU Performance Benchmarks

How AI hears accents: An audible visualization of accent clusters

Printing Petscii Faster

Hacking the Humane AI Pin

SmolBSD – build your own minimal BSD system

How to turn liquid glass into a solid interface

Astronomers 'image' a mysterious dark object in the distant Universe

Show HN: Greenonion.ai – AI-Powered Design Assistant

A 12,000-year-old obelisk with a human face was found in Karahan Tepe

Python's splitlines does more than just newlines

Surveillance data challenges what we thought we knew about location tracking

CSS for Styling a Markdown Post

What Americans die from vs. what the news reports on

GrapheneOS is ready to break free from Pixels

Disk Prices

Why Is SQLite Coded In C

ADS-B Exposed

Preparing for AI's economic impact: exploring policy responses

Show HN: Metorial (YC F25) – Vercel for MCP

Beating the L1 cache with value speculation (2021)

FSF announces Librephone project

I am a programmer, not a rubber-stamp that approves Copilot generated code

Pixnapping Attack

A modern approach to preventing CSRF in Go

Beliefs that are true for regular software but false when applied to AI

Nvidia DGX Spark: great hardware, early days for the ecosystem

Interviewing Intel's Chief Architect of x86 Cores

How bad can a $2.97 ADC be?

Can we know whether a profiler is accurate?

DOJ seizes $15B in Bitcoin from 'pig butchering' scam based in Cambodia

Unpacking Cloudflare Workers CPU Performance Benchmarks

How AI hears accents: An audible visualization of accent clusters

Printing Petscii Faster

Hacking the Humane AI Pin

SmolBSD – build your own minimal BSD system

How to turn liquid glass into a solid interface

Astronomers 'image' a mysterious dark object in the distant Universe

Show HN: Greenonion.ai – AI-Powered Design Assistant

A 12,000-year-old obelisk with a human face was found in Karahan Tepe

Python's splitlines does more than just newlines

Surveillance data challenges what we thought we knew about location tracking

CSS for Styling a Markdown Post

What Americans die from vs. what the news reports on

GrapheneOS is ready to break free from Pixels

Disk Prices

Why Is SQLite Coded In C

ADS-B Exposed

Preparing for AI's economic impact: exploring policy responses

Show HN: Metorial (YC F25) – Vercel for MCP

Beating the L1 cache with value speculation (2021)

Nvidia DGX Spark: great hardware, early days for the ecosystem

Comments