frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Nvidia DGX Spark and Apple Mac Studio = 4x Faster LLM Inference with EXO 1.0

https://blog.exolabs.net/nvidia-dgx-spark/
59•edelsohn•2d ago

Comments

pram•2d ago
Very cool, using the DGX like an “AI eGPU.” I wonder if this could also benefit stuff like Stable Diffusion/WAN etc?
alexandercheema•1d ago
Yes, these models are mostly compute-bound so benefit even more from the compute on the DGX Spark.
dekhn•2d ago
Are you using USB-C for networking between the Spark and the Mac?
pdpi•2d ago
IP over thunderbolt is definitely a thing, don't know whether IP over USB is also a thing. USB4x2 or TB5 can do 80Gib/s symmetrical or 120+40 asymmetrical (and boy is this a poster child for the asymmetrical setup). The Mac definitely supports that fine, so, as long as the Spark plays nice, USB is actually a legitimately decent choice.
esseph•2d ago
USB4 was based on Thunderbolt3

Yes, it's a thing that works.

mehdibl•2d ago
The gain is only in prefill and if the task/output is complex the gain will be totally minor. So the numbers are quitly exagerated here based on a prompt that is taking less than 2s to decode. So I guess we are not here doing complex tasks with 100's or 1000 token output. For the cost of an M3 Ultra + DGX the gain seem minimal and most of all, exo didn't clarify the model used here and it's for sure not a dense model or an MoE with 1B or 2B experts otherwise the mac ultra too will suffer a lot and the layers will be bigger!
solarkraft•2d ago
Anecdotally, even medium-sized prompts (a few thousand tokens) on pretty small models (8-2B) have resulted in extremely noticeable slowdowns (vast majority of total processing time) on my M1 Mac, leading me to appreciate the significance of the pre-fill step (and difficulty of processing large contexts locally).
adam_arthur•2d ago
I'm confused by all the takes implying decode is more important than prefill.

There are an enormous number of use cases where the prompt is large and the expected output is small.

E.g. providing data for the LLM to analyze, after which it gives a simple yes/no Boolean response. Or selecting a single enum value from a set.

This pattern seems far more valuable in practice, than the common and lazy open ended chat style implementations (lazy from a product perspective).

Obviously decode will be important for code generation or search, but that's such a small set of possible applications, and you'll probably always do better being on the latest models in the cloud.

drodgers•2d ago
This is really cool!

Now I'm trying to stop myself from finding an excuse to spend upwards of $30k on compute hardware...

tuananh•2d ago
if you have $30k to spare, I'm sure there are better options
jsight•2d ago
Yeah, a couple of RTX Pro 6000 cards would blow this away and still leave him with money to spare.
solarkraft•2d ago
This is a wonderful explanation of the two phases! I appreciate the hardware concerns for both now.

Reading the article I wished for a device that just does both things well and on that topic it might be noteworthy that Apple's just-released M5 has approximately 3.5x-ed TTFT performance compared to M4, according to their claims!

daft_pink•2d ago
It’s really sad that exo went private.
ethanpil•1d ago
How do you know this happened? I thought it was an abandoned project until I saw this post. I've been diligently checking weekly for new releases but nothing for almost a year...
alexandercheema•1d ago
Appreciate you checking back so often. We have some exciting plans. Keep checking and it won't be long before something pops up :)
storus•2d ago
Wouldn't this restrict memory to 128GB, wasting M3 Ultra potential?
alexandercheema•1d ago
Blog author here. Actually, no. The model can be streamed into the DGX Spark, so we can run prefill of models much larger than 128GB (e.g. DeepSeek R1) on the DGX Spark. This feature is coming to EXO 1.0 which will be open-sourced soonTM.
storus•1d ago
Excellent! Good luck!
musicale•2d ago
But you could also just get two DGX Spark and get 2 * 1.9x = 3.8x total throughput for two query streams.
rcarmo•1d ago
This is very nicely done. I wonder what the values will look like a year from now with M5 Macs, though.

Orca-trackers harness telecoms cable network

https://divernet.com/scuba-news/marine-biology/orca-trackers-harness-telecoms-cable-network/
1•pooyamehri•3m ago•0 comments

Programming language agnosticism is the only way to move forward in life

1•amano-kenji•9m ago•0 comments

Organizing your Nix configuration without flakes

https://somas.is/note-organizing-nix-configuration-without-flakes.html
1•amcclure•10m ago•0 comments

Volvelle, an early example of a paper analog computer

https://en.wikipedia.org/wiki/Volvelle
1•valzevul•18m ago•0 comments

The Launchpad macOS 26 deserves

https://www.launchie.app
1•nickfthedev•24m ago•0 comments

Dumper: CLI utility for creating database backups – PostgreSQL, MySQL and others

https://github.com/elkirrs/dumper
2•thunderbong•29m ago•0 comments

Scientists Discover How Leukemia Cells Evade Treatment

https://www.rutgers.edu/news/scientists-discover-how-leukemia-cells-evade-treatment
1•geox•31m ago•0 comments

The Inevitable Shift from Prompts to Answers

https://www.aivojournal.org/the-inevitable-shift-from-prompts-to-answers/
2•businessmate•38m ago•1 comments

BoE chief: Brexit impact on UK economy negative for foreseeable future

https://news.sky.com/story/brexit-impact-on-uk-economy-negative-for-foreseeable-future-bank-of-en...
3•teleforce•38m ago•0 comments

I wish SSDs gave you CPU performance style metrics about their activity

https://utcc.utoronto.ca/~cks/space/blog/tech/SSDWritePerfMetricsWish
1•zdw•44m ago•0 comments

Lightning Computational Graph Theory

https://www.youtube.com/watch?v=A-z2ZIMWbuY
3•_untra_•47m ago•0 comments

But AI companies grow so fast

https://99d.substack.com/p/but-ai-companies-grow-so-fast
2•airstrike•47m ago•0 comments

Ask HN: Are you a real human or an LLM?

1•whatever1•49m ago•2 comments

Researchers find adding simple sentence to prompts makes AI models more creative

https://venturebeat.com/ai/researchers-find-adding-this-one-simple-sentence-to-prompts-makes-ai-m...
3•jdnier•56m ago•0 comments

Mortality in the news vs. what we usually die from

https://flowingdata.com/2025/10/08/mortality-in-the-news-vs-what-we-usually-die-from/
2•paulpauper•1h ago•0 comments

What I Learned from Lifting

https://www.atvbt.com/what-i-learned-from-lifting/
2•paulpauper•1h ago•0 comments

Another axiom that Euclid missed

https://web.archive.org/web/20250821165148/https://mathenchant.wordpress.com/2025/01/17/the-real-...
2•gsf_emergency_4•1h ago•0 comments

Show HN: NoCloud Bulk Image Converter (Cross-Platform, Privacy-First)

https://github.com/goto-eof/noc-convert
1•cbrx31•1h ago•1 comments

Dive-computer evidence ignored after 12yr-old's death

https://divernet.com/scuba-news/health-safety/death/dive-computer-evidence-ignored-after-12yr-old...
3•pooyamehri•1h ago•2 comments

Show HN: Drag to AirDrop

https://sindresorhus.com/menu-drop
3•mofle•1h ago•1 comments

Kintsugi Love

https://asim.bearblog.dev/kintsugi-love/
4•asim-shrestha•1h ago•1 comments

The traffickers are winning the war on drugs

https://www.economist.com/briefing/2025/10/16/the-traffickers-are-winning-the-war-on-drugs
26•coloneltcb•1h ago•18 comments

'Girl Take Your Crazy Pills ': Antidepressants Recast as Hot Lifestyle Accessory

https://www.wsj.com/health/wellness/anti-depressants-lifestyle-accessory-3b66027d
4•clanky•1h ago•1 comments

Zeno – open-source AI assistant that turns ideas into tasks

https://zenoapp.site/
2•CrazyCompiler01•1h ago•0 comments

Progress on defeating lifetime-end pointer zapping

https://lwn.net/Articles/1038757/
1•pykello•1h ago•0 comments

Wealth AI – Your Personal AI CFO That Understands Every Rupee You Spend

https://www.sideprojectors.com/project/67099/wealthai
2•WoWSaaS•1h ago•0 comments

Nutrition Beliefs Are Just-So Stories

https://www.cremieux.xyz/p/nutrition-beliefs-are-just-so-stories
4•smnthermes•2h ago•1 comments

Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity

https://arxiv.org/abs/2510.01171
1•jdnier•2h ago•0 comments

Is the Serenibrain EEG headband the best alternative to the Muse headband?

https://ihnnk.tech/pages/mindfulness-meditation-system
1•lijunshi•2h ago•0 comments

Rotring NC-Scriber CS 100 (1990)

https://archive.org/details/rotring-nc-scriber-cs-100-1990
3•gregsadetsky•2h ago•0 comments