How good is Mac Studio M3 Ultra for Trillion param models like DeepSeekv4?

3•namegulf•14h ago

Anybody experimented with the following:

Mac Studio M3 Ultra with 96 GB

Is it good for running large models locally?

Comments

bigyabai•14h ago

It might run the smaller flash version, but 96gb is not enough for the trillion-parameter model.

The M3 Ultra's GPU is a bit on the weak side for large-scale inference, so you'll be waiting on token prefill for most coding/agent workflows.

namegulf•12h ago

They have a 512gb ram option but pricey.

Have you tried any other models with this M3 Ultra?

bigyabai•11h ago

The 512gb model would have to use a lobotomized quant like q_2 or q_1, and you would still be waiting 3-5 minutes to process context lengths in the 32,000-64,000 token range.

Apple's GPUs are just not very fast for inference. I'd stick to the smaller 7b-18b parameter range or MOE models like Qwen if you want a usable inference speed.

namegulf•11h ago

Looks like that's a good idea for now. Yeah 3-5 mins is not practical use.

Any thoughts on M5?

They may be soon releasing a M5 model with mac studio/mini.

namegulf•11h ago

NVIDIA DGX Spark a good option?

$4,699.00

But looks like we may need a NVIDIA AI Enterprise - DGX Spark License

josefcub•7h ago

I've got 256GB of RAM on a Mac Studio M3 Ultra. Other posters are right: The M3 Ultra's prefill is super slow with really large models, 3-5 minutes while it digests the new additions to its context before it continues. On my heavy RAM model, I _can_ run 400b-500b models at Q2, and up to about 750b models at Q1, but the wait isn't the worst part.

Lower quants like that affect its output, making it less capable overall and letting it easily forget things.

Here's what I'd do with 96GB of RAM: Run Qwen 3.6 35b-a3b at Q8 for coding/agentic tasks. You'll get around 70tokens generated per second, the prefill is lightning fast in comparison, and you'll get a lot of work done. Qwen 3.6 27b is out now too, and I'm getting 17tok/sec token generation with a slower prefill.

The upshot is that you'll still have 20-40GB of RAM left for your workstation and development loads. Running Qwen 3.6 35b or 27b at Q8 quantization, the model at 128k context uses about 40GB of RAM; my OS and application load uses 20-30GB most of the time, for a total of 60-70. That's plenty of room in memory for you to work _and_ run inference.

You _may_ end up getting Deepseek 4 Flash running, but it'll be a lower quantization like Q2 or Q3, making it kind of dumb in comparison. And you may not have enough memory left over for any appreciable amount of context. Working with today's reasoning models needs context for it to generate and give out good answers. Doubly so for agentic/coding tasks.

namegulf•5h ago

Thanks, that's very helpful.

Totally agree, context is everything for agentic coding.

Any other hardware reco that'll help run larger models?

Tell HN: Claude 4.7 is ignoring stop hooks

Ask HN: Any recommendataions for exporting data from Amazon?

Ask HN: Scaling a targeted web crawler beyond 500M pages/day

Ask HN: Anyone still using JetBrains products today?

How good is Mac Studio M3 Ultra for Trillion param models like DeepSeekv4?

Ask HN: How do solo devs protect their work in the age of vibe coding?

Tell HN: Anthropic won't reset usage limits for those who downgraded

Tell HN: Codex macOS app switches to Fast speed after update without asking

Ask HN: How are you using AI code assistants on large messy legacy code bases?

Tell HN: YouTube RSS feeds no longer work

GPT-5.5 – No ARC-AGI-3 scores

Ask HN: Why are companies so distrustful of remote employees?

Ask HN: Chrome, Brave, Firefox or Something Else?

Hey, it's Earth Day today

Ask HN: Any Developer from Wales?

Is possible a language easy as py, fast as C, more secure than Rust?

Anthropic bans orgs without warning

Can non-developer build commercial products with AI

Ask HN: How are you handling data retention across your stack?

Ask HN: Dear astronomers, what are the most interesting things in space lately?

Ask HN: Would you take a job programming VMS?

Ask HN: Am I getting old, or is working with AI juniors becoming a nightmare?

Ask HN: What skills are future proof in an AI driven job market?

Tell HN: My open-source project hit 5k registered users

Need advice: Back end engineer → infrastructure: how do you make the transition?

My file access workaround for cron in Tahoe

Ask HN: Are cloud coding agents useful in real workflows yet?

Ask HN: How many tabs do you have open in the browser(s) and why?

OpenClaw stats don't add up

Ask HN: What Would Make Stack Overflow Great Again?

Tell HN: Claude 4.7 is ignoring stop hooks

Ask HN: Any recommendataions for exporting data from Amazon?

Ask HN: Scaling a targeted web crawler beyond 500M pages/day

Ask HN: Anyone still using JetBrains products today?

How good is Mac Studio M3 Ultra for Trillion param models like DeepSeekv4?

Ask HN: How do solo devs protect their work in the age of vibe coding?

Tell HN: Anthropic won't reset usage limits for those who downgraded

Tell HN: Codex macOS app switches to Fast speed after update without asking

Ask HN: How are you using AI code assistants on large messy legacy code bases?

Tell HN: YouTube RSS feeds no longer work

GPT-5.5 – No ARC-AGI-3 scores

Ask HN: Why are companies so distrustful of remote employees?

Ask HN: Chrome, Brave, Firefox or Something Else?

Hey, it's Earth Day today

Ask HN: Any Developer from Wales?

Is possible a language easy as py, fast as C, more secure than Rust?

Anthropic bans orgs without warning

Can non-developer build commercial products with AI

Ask HN: How are you handling data retention across your stack?

Ask HN: Dear astronomers, what are the most interesting things in space lately?

Ask HN: Would you take a job programming VMS?

Ask HN: Am I getting old, or is working with AI juniors becoming a nightmare?

Ask HN: What skills are future proof in an AI driven job market?

Tell HN: My open-source project hit 5k registered users

Need advice: Back end engineer → infrastructure: how do you make the transition?

My file access workaround for cron in Tahoe

Ask HN: Are cloud coding agents useful in real workflows yet?

Ask HN: How many tabs do you have open in the browser(s) and why?

OpenClaw stats don't add up

Ask HN: What Would Make Stack Overflow Great Again?

How good is Mac Studio M3 Ultra for Trillion param models like DeepSeekv4?

Comments