frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Tell HN: Claude 4.7 is ignoring stop hooks

79•LatencyKills•14h ago•82 comments

Ask HN: Any recommendataions for exporting data from Amazon?

2•coreyp_1•11h ago•2 comments

Ask HN: Scaling a targeted web crawler beyond 500M pages/day

14•honungsburk•22h ago•3 comments

Ask HN: Anyone still using JetBrains products today?

3•zkid18•14h ago•4 comments

How good is Mac Studio M3 Ultra for Trillion param models like DeepSeekv4?

3•namegulf•14h ago•7 comments

Ask HN: How do solo devs protect their work in the age of vibe coding?

19•langs•1d ago•13 comments

Tell HN: Anthropic won't reset usage limits for those who downgraded

3•vintagedave•17h ago•0 comments

Tell HN: Codex macOS app switches to Fast speed after update without asking

4•mfi•1d ago•1 comments

Ask HN: How are you using AI code assistants on large messy legacy code bases?

8•thinkingtoilet•19h ago•11 comments

Tell HN: YouTube RSS feeds no longer work

32•019•2d ago•14 comments

GPT-5.5 – No ARC-AGI-3 scores

10•AG25•1d ago•3 comments

Ask HN: Why are companies so distrustful of remote employees?

12•lyfeninja•1d ago•19 comments

Ask HN: Chrome, Brave, Firefox or Something Else?

7•wasimsk•1d ago•16 comments

Hey, it's Earth Day today

22•burnt-resistor•2d ago•15 comments

Ask HN: Any Developer from Wales?

4•danver0•12h ago•4 comments

Is possible a language easy as py, fast as C, more secure than Rust?

6•jerryzhang66•1d ago•9 comments

Anthropic bans orgs without warning

38•alpinisme•3d ago•18 comments

Can non-developer build commercial products with AI

5•rkorlimarla•1d ago•9 comments

Ask HN: How are you handling data retention across your stack?

4•preston-kwei•2d ago•6 comments

Ask HN: Dear astronomers, what are the most interesting things in space lately?

15•simonebrunozzi•1d ago•5 comments

Ask HN: Would you take a job programming VMS?

11•smackeyacky•2d ago•19 comments

Ask HN: Am I getting old, or is working with AI juniors becoming a nightmare?

40•MichaelRazum•1d ago•45 comments

Ask HN: What skills are future proof in an AI driven job market?

35•sunny678•4d ago•77 comments

Tell HN: My open-source project hit 5k registered users

19•darkhorse13•3d ago•12 comments

Need advice: Back end engineer → infrastructure: how do you make the transition?

7•gokuljs•2d ago•6 comments

My file access workaround for cron in Tahoe

6•noduerme•3d ago•2 comments

Ask HN: Are cloud coding agents useful in real workflows yet?

5•Rperry2174•2d ago•3 comments

Ask HN: How many tabs do you have open in the browser(s) and why?

4•juujian•1d ago•14 comments

OpenClaw stats don't add up

11•iliaov•3d ago•6 comments

Ask HN: What Would Make Stack Overflow Great Again?

10•nnurmanov•3d ago•26 comments
Open in hackernews

How good is Mac Studio M3 Ultra for Trillion param models like DeepSeekv4?

3•namegulf•14h ago
Anybody experimented with the following:

Mac Studio M3 Ultra with 96 GB

Is it good for running large models locally?

Comments

bigyabai•14h ago
It might run the smaller flash version, but 96gb is not enough for the trillion-parameter model.

The M3 Ultra's GPU is a bit on the weak side for large-scale inference, so you'll be waiting on token prefill for most coding/agent workflows.

namegulf•12h ago
They have a 512gb ram option but pricey.

Have you tried any other models with this M3 Ultra?

bigyabai•11h ago
The 512gb model would have to use a lobotomized quant like q_2 or q_1, and you would still be waiting 3-5 minutes to process context lengths in the 32,000-64,000 token range.

Apple's GPUs are just not very fast for inference. I'd stick to the smaller 7b-18b parameter range or MOE models like Qwen if you want a usable inference speed.

namegulf•11h ago
Looks like that's a good idea for now. Yeah 3-5 mins is not practical use.

Any thoughts on M5?

They may be soon releasing a M5 model with mac studio/mini.

namegulf•11h ago
NVIDIA DGX Spark a good option?

$4,699.00

But looks like we may need a NVIDIA AI Enterprise - DGX Spark License

josefcub•7h ago
I've got 256GB of RAM on a Mac Studio M3 Ultra. Other posters are right: The M3 Ultra's prefill is super slow with really large models, 3-5 minutes while it digests the new additions to its context before it continues. On my heavy RAM model, I _can_ run 400b-500b models at Q2, and up to about 750b models at Q1, but the wait isn't the worst part.

Lower quants like that affect its output, making it less capable overall and letting it easily forget things.

Here's what I'd do with 96GB of RAM: Run Qwen 3.6 35b-a3b at Q8 for coding/agentic tasks. You'll get around 70tokens generated per second, the prefill is lightning fast in comparison, and you'll get a lot of work done. Qwen 3.6 27b is out now too, and I'm getting 17tok/sec token generation with a slower prefill.

The upshot is that you'll still have 20-40GB of RAM left for your workstation and development loads. Running Qwen 3.6 35b or 27b at Q8 quantization, the model at 128k context uses about 40GB of RAM; my OS and application load uses 20-30GB most of the time, for a total of 60-70. That's plenty of room in memory for you to work _and_ run inference.

You _may_ end up getting Deepseek 4 Flash running, but it'll be a lower quantization like Q2 or Q3, making it kind of dumb in comparison. And you may not have enough memory left over for any appreciable amount of context. Working with today's reasoning models needs context for it to generate and give out good answers. Doubly so for agentic/coding tasks.

namegulf•5h ago
Thanks, that's very helpful.

Totally agree, context is everything for agentic coding.

Any other hardware reco that'll help run larger models?