frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: SafeClaw – a way to manage multiple Claude Code instances in containers

https://github.com/ykdojo/safeclaw
1•ykdojo•1m ago•0 comments

The Future of the Global Open-Source AI Ecosystem: From DeepSeek to AI+

https://huggingface.co/blog/huggingface/one-year-since-the-deepseek-moment-blog-3
1•gmays•1m ago•0 comments

The Evolution of the Interface

https://www.asktog.com/columns/038MacUITrends.html
1•dhruv3006•3m ago•0 comments

Azure: Virtual network routing appliance overview

https://learn.microsoft.com/en-us/azure/virtual-network/virtual-network-routing-appliance-overview
1•mariuz•3m ago•0 comments

Seedance2 – multi-shot AI video generation

https://www.genstory.app/story-template/seedance2-ai-story-generator
1•RyanMu•6m ago•1 comments

Πfs – The Data-Free Filesystem

https://github.com/philipl/pifs
1•ravenical•10m ago•0 comments

Go-busybox: A sandboxable port of busybox for AI agents

https://github.com/rcarmo/go-busybox
2•rcarmo•10m ago•0 comments

Quantization-Aware Distillation for NVFP4 Inference Accuracy Recovery [pdf]

https://research.nvidia.com/labs/nemotron/files/NVFP4-QAD-Report.pdf
1•gmays•11m ago•0 comments

xAI Merger Poses Bigger Threat to OpenAI, Anthropic

https://www.bloomberg.com/news/newsletters/2026-02-03/musk-s-xai-merger-poses-bigger-threat-to-op...
1•andsoitis•11m ago•0 comments

Atlas Airborne (Boston Dynamics and RAI Institute) [video]

https://www.youtube.com/watch?v=UNorxwlZlFk
1•lysace•12m ago•0 comments

Zen Tools

http://postmake.io/zen-list
1•Malfunction92•15m ago•0 comments

Is the Detachment in the Room? – Agents, Cruelty, and Empathy

https://hailey.at/posts/3mear2n7v3k2r
1•carnevalem•15m ago•0 comments

The purpose of Continuous Integration is to fail

https://blog.nix-ci.com/post/2026-02-05_the-purpose-of-ci-is-to-fail
1•zdw•17m ago•0 comments

Apfelstrudel: Live coding music environment with AI agent chat

https://github.com/rcarmo/apfelstrudel
1•rcarmo•18m ago•0 comments

What Is Stoicism?

https://stoacentral.com/guides/what-is-stoicism
3•0xmattf•19m ago•0 comments

What happens when a neighborhood is built around a farm

https://grist.org/cities/what-happens-when-a-neighborhood-is-built-around-a-farm/
1•Brajeshwar•19m ago•0 comments

Every major galaxy is speeding away from the Milky Way, except one

https://www.livescience.com/space/cosmology/every-major-galaxy-is-speeding-away-from-the-milky-wa...
2•Brajeshwar•19m ago•0 comments

Extreme Inequality Presages the Revolt Against It

https://www.noemamag.com/extreme-inequality-presages-the-revolt-against-it/
2•Brajeshwar•19m ago•0 comments

There's no such thing as "tech" (Ten years later)

1•dtjb•20m ago•0 comments

What Really Killed Flash Player: A Six-Year Campaign of Deliberate Platform Work

https://medium.com/@aglaforge/what-really-killed-flash-player-a-six-year-campaign-of-deliberate-p...
1•jbegley•21m ago•0 comments

Ask HN: Anyone orchestrating multiple AI coding agents in parallel?

1•buildingwdavid•22m ago•0 comments

Show HN: Knowledge-Bank

https://github.com/gabrywu-public/knowledge-bank
1•gabrywu•27m ago•0 comments

Show HN: The Codeverse Hub Linux

https://github.com/TheCodeVerseHub/CodeVerseLinuxDistro
3•sinisterMage•29m ago•2 comments

Take a trip to Japan's Dododo Land, the most irritating place on Earth

https://soranews24.com/2026/02/07/take-a-trip-to-japans-dododo-land-the-most-irritating-place-on-...
2•zdw•29m ago•0 comments

British drivers over 70 to face eye tests every three years

https://www.bbc.com/news/articles/c205nxy0p31o
40•bookofjoe•29m ago•13 comments

BookTalk: A Reading Companion That Captures Your Voice

https://github.com/bramses/BookTalk
1•_bramses•30m ago•0 comments

Is AI "good" yet? – tracking HN's sentiment on AI coding

https://www.is-ai-good-yet.com/#home
3•ilyaizen•31m ago•1 comments

Show HN: Amdb – Tree-sitter based memory for AI agents (Rust)

https://github.com/BETAER-08/amdb
1•try_betaer•31m ago•0 comments

OpenClaw Partners with VirusTotal for Skill Security

https://openclaw.ai/blog/virustotal-partnership
2•anhxuan•32m ago•0 comments

Show HN: Seedance 2.0 Release

https://seedancy2.com/
2•funnycoding•32m ago•0 comments
Open in hackernews

Nvidia DGX Spark: great hardware, early days for the ecosystem

https://simonwillison.net/2025/Oct/14/nvidia-dgx-spark/
189•GavinAnderegg•3mo ago

Comments

ChrisArchitect•3mo ago
More discussion: https://news.ycombinator.com/item?id=45575127
ur-whale•3mo ago
As is usual for NVidia: great hardware, an effing nightmare figuring out how to setup the pile of crap they call software.
p_l•3mo ago
And yet CUDA has looked way better than ATi/AMD offerings in the same area despite ATi/AMD technically being first to deliver GPGPU (major difference is that CUDA arrived year later but supported everything from G80 up, and nicely evolved, while AMD managed to have multiple platforms with patchy support and total rewrites in between)
cylemons•3mo ago
What was the AMD GPGPU called?
p_l•3mo ago
Which one? We first had the flurry of third party work (Brook, Lib Sh, etc), then we had AMD "Close to Metal" which was IIRC based on Brook, soon followed with dedicated cards, year later we got CUDA (also derived partially from Brook!) and AMD Stream SDK, later renamed APP SDK. Then we got HIP / HSA stuff which unfortunately has its biggest legacy (outside of availability of HIP as way to target ROCm and CUDA simultaneously) in low level details of how GPU game programming evolved on Xbox360 / PS4 / XBox One / PS5. Somewhere in between AMD seemed to bet on OpenCL, yet today with latest drivers from both AMD and nVidia I get more OpenCL features on nVidia.

And of course there's the part of totally random and inconsistent support outside of the few dedicated cards, which is honestly why CUDA the de facto standard everyone measures against - you could run CUDA applications, if slowly, even on the lowest end nvidia cards, like Quadro NVS series (think lowest end GeForce chip but often paired with more displays and different support that focused on business users that didn't need fast 3D). And you still can, generally, run core CUDA code within last few generations on everything from smallest mobile chip to biggest datacenter behemoth.

pjmlp•3mo ago
You forgot the C++AMP collaboration with Microsoft.
p_l•3mo ago
Is it the OpenMP related one or another thing?

I kinda lost track, this whole thread reminded me how hopeful I was to play with GPGPU with my then new X1600

pjmlp•3mo ago
Other thing,

https://learn.microsoft.com/en-us/cpp/parallel/amp/cpp-amp-c...

kanwisher•3mo ago
If you think their software is bad try using any other vendor , makes nvidia looks amazing. Apple is only one close
enoch2090•3mo ago
Although a bit off the GPU topic, I think Apple's Rosetta is the smoothest binary transition I've ever used.
stefan_•3mo ago
Keep in mind this is part of Nvidias embedded offerings. So you will get one release of software ever, and that's gonna be pretty much it for the lifetime of the product.
Grimblewald•3mo ago
Pretty much this. Nvidia isn't big because of their hardware, they're not ahead on that front. It's their software support that makes it worthwhile.
jasonjmcghee•3mo ago
Except the performance people are seeing is way below expectations. It seems to be slower than an M4. Which kind of defeats the purpose. It was advertised as 1 Petaflop on your desk.

But maybe this will change? Software issues somehow?

It also runs CUDA, which is useful

airstrike•3mo ago
it fits bigger models and you can stack them.

plus apparently some of the early benchmarks were made with ollama and should be disregarded

pjmlp•3mo ago
Try to use Intel or AMD stuff instead.
triwats•3mo ago
Fascinating to me managing some of these systems just how bad the software is.

Management becomes layers upon layers of bash scripts which ends up calling a final batch script written by Mellanox.

They'll catch up soon, but you end up having to stay strictly on their release cycle always.

Lots of effort.

simonw•3mo ago
It's notable how much easier it is to get things working now that the embargo has lifted and other projects have shared their integrations.

I'm running VLLM on it now and it was as simple as:

  docker run --gpus all -it --rm \
    --ipc=host --ulimit memlock=-1 \
    --ulimit stack=67108864 \
    nvcr.io/nvidia/vllm:25.09-py3
(That recipe from https://catalog.ngc.nvidia.com/orgs/nvidia/containers/vllm?v... )

And then in the Docker container:

  vllm serve &
  vllm chat
The default model it loads is Qwen/Qwen3-0.6B, which is tiny and fast to load.
behnamoh•3mo ago
I'm curious, does its architecture support all CUDA features out of the box or is it limited compared to 5090/6000 Blackwell?
3abiton•3mo ago
As someone who hot on early on the Ryzen AI 395+, are there any added value for the DGX Spark beside having cuda (compared to ROCm/vulkan)? I feel Nvidia fumbled the marketing, either making it sound like an inference miracle, or a dev toolkit (then again not enough to differentiate it from the superior AGX Thor).

I am curious about where you find its main value, and how would it fit within your tooling, and use cases compared to other hardware?

From the inference benchmarks I've seen, a M3 Ultra always come on top.

storus•3mo ago
M3 Ultra has slow GPU and no HW FP4 support so its initial token decoding is going to be slow, practically unusable for 100k+ context sizes. For token generation that is memory bound M3 Ultra would be much faster, but who wants to wait 15 minutes to read the context? Spark will be much faster for initial token processing, giving you a much better time to first token, but then 3x slower (273 vs 800GB/s) in token generation throughput. You need to decide what is more important for you. Strix Halo is IMO the worst of both worlds at the moment due to having the worst specs in both dimensions and the least mature software stack.
EnPissant•3mo ago
This is 100% the truth, and I am really puzzled to see people push Strix Halo so much for local inference. For about $1200 more you can just build a DDR5 + 5090 machine that will crush a Strix Halo with just about every MoE model (equal decode and 10-20x faster prefill for large, and huge gaps for any MoE that fits in 32GB VRAM). I'd have a lot more confidence in reselling a 5090 in the future than a Strix Halo machine, too.
justinclift•3mo ago
It's very likely worth trying ComfyUI on it too: https://github.com/comfyanonymous/ComfyUI

Installation instructions: https://github.com/comfyanonymous/ComfyUI#nvidia

It's a webUI that'll let you try a bunch of different, super powerful things, including easily doing image and video generation in lots of different ways.

It was really useful to me when benching stuff at work on various gear. ie L4 vs A40 vs H100 vs 5th gen EPYC cpus, etc.

matt3210•3mo ago
> even in a Docker container

I should be allowed to do stupid things when I want. Give me an override!

simonw•3mo ago
A couple of people have since tipped me off that this works around that:

  IS_SANDBOX=0 claude --dangerously-skip-permissions
You can run that as root and Claude won't complain.
fulafel•3mo ago
If you want to run stuff in Docker as root, better enable uid remapping, since otherwise the in-container uid 0 is still the real uid 0 and weakens the security boundary of the containerization.

(Because Docker doesn't do this as by default, best practice is to create a non root user in your dockerfile and run as that)

simonw•3mo ago
Correction: it's IS_SANDBOX=1
fnordpiglet•3mo ago
This seems to be missing the obligatory pelican on a bicycle.
simonw•3mo ago
Here's one I made with it - I didn't include it in the blog post because I had so many experiments running that I lost track of which model I'd used to create it! https://tools.simonwillison.net/svg-render#%3Csvg%20width%3D...
fnordpiglet•3mo ago
That seat post looks fairly unpleasant.
justinclift•3mo ago
Looks like the poor pelican was crucified?!?! ;)
two_handfuls•3mo ago
I wonder how this compares financially with renting something on the cloud.
killingtime74•3mo ago
For me as an employee in Australia, I could buy this and write it off my tax as a work expense myself. To rent, it would be much more cumbersome, involving the company. That's 45% off (our top marginal tax rate).
Grimburger•3mo ago
> That's 45% off (our top marginal tax rate)

Can people please not listen to this terrible advice that gets repeated so oft, especially in Australian IT circles somehow by young naive folks.

You really need to talk to your accountant here.

It's probably under 25% in deduction at double the median wage, little bit over @ triple, and that's *only* if you are using the device entirely for work, as in it sits in an office and nowhere else, if you are using it personally you open yourself up to all sorts of drama if and when the ATO ever decides to audit you for making a $6k AUD claim for a computing device beyond what you normally to use to do your job.

lukeh•3mo ago
Also, you can only deduct it in a single financial year if you are eligible for the Instant asset write-off program.

I'm sure I'll get downvoted for this, but this common misunderstanding about tax deductions does remind me of a certain Seinfeld episode :)

Kramer: It's just a write off for them

Jerry: How is it a write off?

Kramer: They just write it off

Jerry: Write it off what?

Kramer: Jerry all these big companies they write off everything

Jerry: You don't even know what a write off is

Kramer: Do you?

Jerry: No. I don't

Kramer: But they do and they are the ones writing it off

killingtime74•3mo ago
Correct. You can deduct over multiple years, so you do get the same amount back.
killingtime74•3mo ago
My work is entirely from home. I happen to also be an ex lawyer, quite familiar with deduction rules and not altogether young. Can you explain why you think it's not 45% off? Ive deducted thousands in AI related work expenses over the years.

Even if what you are saying is correct, the discount is just lower. This is compared to no discount on compute/GPU rental unless your company purchases it.

speedgoose•3mo ago
Depending on the kind of project and data agreements, it’s sometimes much easier to run computations on premise than in the cloud. Even though the cloud is somewhat more secure.

I for example have some healthcare research projects with personally identifiable data, and in these times it’s simpler for the users to trust my company, than my company and some overseas company and it’s associated government.

fisian•3mo ago
The reported 119GB vs. 128GB according to spec is because 128GB (1e9 bytes) equals 119GiB (2^30 bytes).
simonw•3mo ago
Ugh, that one gets me every time!
wmf•3mo ago
That can't be right because RAM has always been reported in binary units. Only storage and networking use lame decimal units.
simonw•3mo ago
Looks like Claude reported it based on this:

  ● Bash(free -h)
    ⎿                 total        used        free      shared  buff/cache   available
       Mem:           119Gi       7.5Gi       100Gi        17Mi        12Gi       112Gi
       Swap:             0B          0B          0B
That 119Gi is indeed gibibytes, and 119Gi in GB is 128GB.
wtallis•3mo ago
You're barking up the wrong tree. Nobody's manufacturing power-of-ten sized DRAM chips for NVIDIA; the amount of memory physically present has to be 128GiB. If `free` isn't reporting that much usable capacity, you need to dig into the kernel logs to see how much is being reserved by the firmware and kernel and drivers. (If there was more memory missing, it could plausibly be due to in-band ECC, but that doesn't seem to be an option for DGX Spark.)
rgovostes•3mo ago
I'm hopeful this makes Nvidia take aarch64 seriously for Jetson development. For the past several years Mac-based developers have had to run the flashing tools in unsupported ways, in virtual machines with strange QEMU options.
reenorap•3mo ago
Is 128 GB of unified memory enough? I've found that the smaller models are great as a toy but useless for anything realistic. Will 128 GB hold any model that you can do actual work with or query for answers that returns useful information?
simonw•3mo ago
There are several 70B+ models that are genuinely useful these days.

I'm looking forward to GLM 4.6 Air - I expect that one should be pretty excellent, based on experiments with a quantized version of its predecessor on my Mac. https://simonwillison.net/2025/Jul/29/space-invaders/

behnamoh•3mo ago
the question is: how does the prompt processing time on this compare to M3 Ultra because that one sucks at RAG even though it can technically handle huge models and long contexts...
zozbot234•3mo ago
Prompt processing time on Apple Silicon might benefit from making use of the NPU/Apple Neural Engine. (Note, the NPU is bad if you're limited by memory bandwidth, but prompt processing is compute limited.) Just needs someone to do the work.
magicalhippo•3mo ago
Depending on you use-case, I've been quite impressed with GPT-OSS 20B with high reasoning effort.

The 120B model is better but too slow since I only have 16GB VRAM. That model runs decent[1] on the Spark.

[1]: https://news.ycombinator.com/item?id=45576737

cocogoatmain•3mo ago
128gb unified memory is enough for pretty good models, but honestly for the price of this it is better just go go with a few 3090s or a Mac due to memory bandwidth limitations of this card
monster_truck•3mo ago
Whole thing feels like a paper launch being held up by people looking for blog traffic missing the point.

I'd be pissed if I paid this much for hardware and the performance was this lacklustre while also being kneecapped for training

rubatuga•3mo ago
When the networking is 25GB/s and the memory bandwidth is 210GB/s you know something is seriously wrong.
TiredOfLife•3mo ago
It has connectx 200GB/s
wtallis•3mo ago
No, the NIC runs at 200Gb/s, not 200GB/s.
_ache_•3mo ago
What do you mean by "kneecapped for training"? Isn't it 128GB of VRAM enougth for small model training, that a current GC can't do?

Obviously, even with connectx, it's only 240Gi of VRAM, so no big models can be trained.

monster_truck•3mo ago
Spend some time looking at the real benchmarks before writing nonsense
_ache_•3mo ago
You are quite rude here. I was asking questions. The benchmarks are very new and don't explains why it can used for training.

But if FP4 means 4bit floating point, and that the hardware capability of the DGX Spark is effectively only in FP4, then yes. That was nonsense to wish it could have been used for training. But it wasn't obvious from the advertising of nvidia.

jhcuii•3mo ago
Despite the large video memory capacity, its video memory bandwidth is very low. I guess the model's decode speed will be very slow. Of course, this design is very well suited for the inference needs of MoE models.
rcarmo•3mo ago
About what I expected. The Jetson series had the same issues, mostly, at a smaller scale: Deviate from the anointed versions of YOLO, and nothing runs without a lot of hacking. Being beholden to CUDA is both a blessing and a curse, but what I really fear is how long it will take for this to become an unsupported golden brick.

Also, the other reviews I’ve seen point out that inference speed is slower than a 5090 (or on par with a 4090 with some tailwind), so the big difference here (other than core counts) is the large chunk of “unified” memory. Still seems like a tricky investment in an age where a Mac will outlive everything else you care to put on a desk and AMD has semi-viable APUs with equivalent memory architectures (even if RoCm is… well… not all there yet).

Curious to compare this with cloud-based GPU costs, or (if you really want on-prem and fully private) the returns from a more conventional rig.

EnPissant•3mo ago
This thing is dramatically slower than a 4090 both in prefill and decode. And I do mean DRAMATICALLY.

I have no immediate numbers for prefill, but the memory bandwidth is ~4x greater on a 4090 which will lead to ~4x faster decode.

KeplerBoy•3mo ago
This is kind of an embedded 5070 with a massive amount of relatively slow memory, don't expect miracles.
TiredOfLife•3mo ago
No need to put unified in scare quotes.
ZiiS•3mo ago
Given the likelihood you are bound by the 4x lower memory bandwidth this implies; at least for decode, I think they are warranted.
3abiton•3mo ago
> Also, the other reviews I’ve seen point out that inference speed is slower than a 5090 (or on par with a 4090 with some tailwind), so the big difference here (other than core counts) is the large chunk of “unified” memory.

It's not comparable to 4090 inference speed. It's significantly slower, because of the lack of MXFP4 models out there. Even compared to Ryzen AI 395 (ROCm / Vulkan), on gpt-oss-120B mxfp4, somehow DGX manages to lose on token generation (pp is faster though.

> Still seems like a tricky investment in an age where a Mac will outlive everything else you care to put on a desk and AMD has semi-viable APUs with equivalent memory architectures (even if RoCm is… well… not all there yet).

ROCm (v7) for APUs came a long way actually, mostly thanks to the community effort, it's quite competitive and more mature. It's still not totally user friendly, but it doesn't break between updates (I know the bar is low, but that was the status a year ago). So in comparison, the strix halo offers lots of value for your money if you need a cheap compact inference box.

Havn't tested finetuning / training yet, but in theory it's supported, not to forget that APU is extremely performany for "normal" tasks (threadripper level) compared to the CPU of the DGX Spark.

rcarmo•3mo ago
Yeah, good point on the FP4. I'm seeing people complain about INT8 as well, which ought to "just work", but everyone who has one (not many) is wary of wandering off the happy path.
saagarjha•3mo ago
I’m kind of surprised at the issues everyone is having with the arm64 hardware. PyTorch has been building official wheels for several months already as people get on GH200s. Has the rest of the ecosystem not kept up?
smallnamespace•3mo ago
An 14-inch M4 Max Macbook Pro with 128GB of RAM has a list price of $4700 or so and twice the memory bandwidth.

For inference decode the bandwidth is the main limitation so if running LLMs is your use case you should probably get a Mac instead.

dialogbox•3mo ago
Why Macbook Pro? Isn't Mac Studio is a lot cheaper and the right one to compare with DGX Spark?
AndroTux•3mo ago
I think the idea is that instead of spending an additional $4000 on external hardware, you can just buy one thing (your main work machine) and call it a day. Also, the Mac Studio isn’t that much cheaper at that price point.
MomsAVoxell•3mo ago
Being able to leave the thing at home and access it anywhere is a feature, not a bug.

The Mac Studio is a more appropriate comparison. There is not yet a DGX laptop, though.

AndroTux•3mo ago
> Being able to leave the thing at home and access it anywhere is a feature, not a bug.

I can do that with a laptop too. And with a dedicated GPU. Or a blade in a data center. I though the feature of the DGX was that you can throw it in a backpack.

MomsAVoxell•3mo ago
The DGX is clearly a desktop system. Sure, it's luggable. But the point is, it's not a laptop.
dialogbox•3mo ago
> Also, the Mac Studio isn’t that much cheaper at that price point.

In the list price, it's 1000 USD cheaper. 3,699 vs 4,699 I know a lot can be relative but that's a lot for me for sure.

AndroTux•3mo ago
Fair. I looked it up just yesterday so I though I knew the prices from memory, but apparently I mixed something up.
pantalaimon•3mo ago
How are you spending $4000 on a screen and a keyboard?
AndroTux•3mo ago
You're not going to use the DGX as your main machine, so you'll need another computer. Sure, not a $4000 one, but you'll want at least some performance, so it'll be another $1000-$2000.
pantalaimon•3mo ago
> You're not going to use the DGX as your main machine

Why not?

BoredPositron•3mo ago
Because Nvidia is incredibly slow with kernel updates and you are lucky if you get them at all after just two years. I am curious if they will update these machines for longer than their older dgx like hardware.
smallnamespace•3mo ago
I didn't think of it ;)

Now that you bring it up, the M3 ultra Mac Studio goes up to 512GB for about a $10k config with around 850 GB/s bandwidth, for those who "need" a near frontier large model. I think 4x the RAM is not quite worth more than doubling the price, especially if MoE support gets better, but it's interesting that you can get a Deepseek R1 quant running on prosumer hardware.

ChocolateGod•3mo ago
People may prefer running in environments that match their target production environment, so macOS is out of the question.
deviation•3mo ago
It's a hoop to jump through, but I'd recommend checking out Apple's container/containerization services which help accomplish just that.

https://github.com/apple/containerization/

ChocolateGod•3mo ago
You're likely still targeting Nvidia's stack for LLMs and Linux's containers on MacOS won't help you there.
bradfa•3mo ago
The Ubuntu that NVIDIA ship is not stock. They seem to be moving towards using stock Ubuntu but it’s not there yet.

Running some other distro on this device is likely to require quite some effort.

pjmlp•3mo ago
It still is more of a Linux distribution than macOS will ever be, UNIX != Linux.
ZiiS•3mo ago
I think the 'environment' here is CUDA; the OS running on the small co-processor you use to buffer some IO is irrelevant.
physicsguy•3mo ago
A few years ago I worked on an ARM supercomputer, as well as a POWER9 one. x86 is so assumed for anything other than trivial things that it is painful.

What I found was a good solution was using Spack: https://spack.io/ That allows you to download/build the full toolchain of stuff you need for whatever architecture you are on - all dependencies, compilers (GCC, CUDA, MPI, etc.), compiled Python packages, etc. and if you need to add a new recipe for something it is really easy.

For the fellow Brits - you can tell this was named by Americans!!!

donw•3mo ago
Who says we don’t have a sense of humor.
physicsguy•3mo ago
It's that it's an offensive term here, not a funny one.
MomsAVoxell•3mo ago
Aussie checking in, smokos over, get back to work...
teleforce•3mo ago
It's good that you've mentioned Spack but not for HPC work, and that's very interesting.

This a high level overview by one of the Spack authors from the HN post back in 2023 (top comment from 100 comments), including the Spack original paper link [1]:

At a very high level, Spack has:

* Nix's installation model and configuration hashing

* Homebrew-like packages, but in a more expressive Python DSL, and with more versions/options

* A very powerful dependency resolver that doesn't just pick from a set of available configurations -- it configures your build according to possible configurations.

You could think of it like Nix with dependency resolution, but with a nice Python DSL. There is more on the "concretizer" (resolver) and how we've used ASP for it here:

* "Using Answer Set Programming for HPC Dependency Solving", https://arxiv.org/abs/2210.08404

[1] Spack – scientific software package manager for supercomputers, Linux, and macOS (100 comments):

https://news.ycombinator.com/item?id=35237269

physicsguy•3mo ago
Well to be fair, I'd consider this to be semi-HPC work - obviously it's not multi-node but because of the hardware it's not the same as using it on an ordinary desktop machine either and has many of the challenges of HPC too in getting stuff compiled for it, particularly with it being ARM based. What you learn when you work on this stuff is that you need very specific combinations of packages that your distro just isn't going to be able to do, and Homebrew doesn't give you enough flexibility in that.
_joel•3mo ago
How would this fare alongside the new Ryzen chips, ooi? From memory is seems to be getting the same amount of tok/s but would the Ryzen box be more useful for other computing, not just AI?
KeplerBoy•3mo ago
If you need x86 or windows for anything it's not even a question.
_joel•3mo ago
Sure, Mac's are also arm based, my question was about general performance, not architecture
justincormack•3mo ago
From reading reviews, dont have either yet: the nvidia actually has unified memory, AMD you have to specify the allocation split. Nvidia maybe has some form of gpu partitioning so you can run multiple smaller models but no one got it working yet. The Ryzen is very different from the pro gpus and the software support wont benefit from work done there, while nvidia is same. You can play games on Ryzen.
blurbleblurble•3mo ago
But on the ryzen the vram allocation can be entirely dynamically allocated. I saw a review showing excellent full GPU usage during inference with the bios vram allocation set to the minimum level, using a very large model. So it's not so simple as you describe (I used to think this was the case too).

Beyond that, seems like the 395 in practice smashes the dgx spark in inference speeds for most models. I haven't seen nvfp4 comparisons yet and would be very interested to.

justincormack•3mo ago
Yes you can set it but in the BIOS, not dynamically as you need it.

I dont think there are any models supporting nvfp4 yet but we shall probably start seeing them.

blurbleblurble•3mo ago
That's what I'm saying, in the review video I saw they allocated as little memory as possible to the GPU in the bios, then used some kind of kernel level dynamic control.
amelius•3mo ago
> x86 architecture for the rest of the machine.

Can anyone explain this? Does this machine have multiple CPU architectures?

catwell•3mo ago
No, he means most NVIDIA-related software assumes a x86 CPU whereas this one is ARM.
amelius•3mo ago
> most NVIDIA-related software assumes a x86 CPU

Is that true? nvidia Jetson is quite mature now, and runs on ARM.

B1FF_PSUVM•3mo ago
I went looking for pictures (in the photo the box looked like a tray to me ...) and found an interesting piece by Canonical touting their Ubuntu base for the OS: https://canonical.com/blog/nvidia-dgx-spark-ubuntu-base

P.S. exploded view from the horse's mouth: https://www.nvidia.com/pt-br/products/workstations/dgx-spark...

rvz•3mo ago
TLDR: Just buy a RTX 5090.

The DGX Spark is completely overpriced for its performance compared to a single RTX 5090.

_ache_•3mo ago
I get the idea. But isn't 128G of "VRAM" (unified actually) could train a usefull ViT model ?

I don't think the 5090 could do that with only 32G of VRAM, couldn't it ?

storus•3mo ago
DGX Spark is not for training, only for inference (FP4).
sailingparrot•3mo ago
Its a DGX dev box, for those (not consumers) that will ultimately need to run their code on large DGX clusters where a failure or a ~3% slowdown of training ends up costing tens of thousands of dollars.

That's the use case, not running LLM efficiently, and you can't do that with a RTX5090.

storus•3mo ago
Is ASUS Ascent GX10 and similar from Lenovo etc. 100% compatible with DGX Spark and can be chained together with the same functionality (i.e. ASUS together with Lenovo for 256GB inference)?
solarboii•3mo ago
Are there any benchmarks comparing it with the Nvidia Thor? It is much more available than spark, and performance might not be very different
andy99•3mo ago
Is there like an affiliate link or something where I can just buy one? Nvidia’s site says sold out, PNY invites you to find a retailer, the other links from nvidia didn’t seem to go anywhere. Can one just click to buy it somewhere?
BoredPositron•3mo ago
My local reseller has them in stock in the EU with a markup... Directly from Nvidia probably not for quite sometime I have some friends who put in preorders and they didn't get any from the first charge.
roughsquare•3mo ago
It still isn't at distributors yet. My distributor has it listed for Oct 27, with units shipping the day after from the warehouse to resellers/etc.
triwats•3mo ago
Added this to my benchmark site as seems that we might see a lot of purpose build desktop systems going forward.

You CAN build - but for people wanting to get started this could be a real viable option.

Perhaps less so though with Apple's M5? Let's see...

https://flopper.io/gpu/nvidia-dgx-spark