frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

Two Qwen3 models on one DGX Spark: the residency math

https://www.devashish.me/p/two-qwen3-models-on-one-dgx-spark
20•devashish86•2d ago

Comments

devashish86•2d ago
Author here. Quick context the post doesn't quite spell out:

The tool_choice="auto" failure on Qwen3-Next isn't a parser issue — the model reasons inside <think>, decides, and never emits the tool call. No error, just empty tool_calls. The fix was swapping the backbone from Thinking to Instruct, not tuning any parser flag.

The "load the bigger model first, size the smaller against actual residency" playbook generalizes to anything with shared CUDA framework overhead. The ~5 GiB framework floor shows up even at small gpu_memory_utilization values — plan against actuals, not targets.

shireboy•1h ago
I’ve been considering a move to local llm setup, having been underwhelmed coat vs value of various online offerings. But at the same time worried anything I get will be obsolete in a couple months. And I don’t want to have to babysit it. I really want some agents managing and creating side hustles for me and have some other things. I’m technical-have written my own harness and use gh copilot and grok daily and have a hosted openwebui+openrouter thing. I’m also torn between a 128g MacBook Pro or a framework, or spark or similar and lightweight laptop to access. Would love advice anyone has for (or against) going local. I have asked ai but have analysis paralysis as 5k would be a big investment for me so I want to make right choices
peddling-brink•51m ago
Well, if you are making side-hustle money now using online models that, critically, you could also run at home, then it sounds like it’s just a matter of numbers. Oh and, unless you spend a lot more than 5k, your local model will still be slower than the online model. What’s your estimated ROI?

Assuming that’s not true based on your phrasing, you’d be shooting yourself in the foot. Start using online models with the same quant at least benchmark as what you could run at home. Prepare for the at home model to be slower.

ericd•49m ago
You probably want to try renting some time on a dedicated box with roughly the specs you’re considering and running the open models for a bit to see if you would actually use them before dropping a lot on local hardware. A 128 gig MacBook Pro isn’t going to get you an amazing model, and certainly not amazing speed. GLM 5.2 wants something like 350+ gigs at fp4 iirc.
dzink•1h ago
Have you tried llama.cpp with unsloth and models suited to it? GLM flash? It seemed to allow more models to be tried soon after they are released. Haven’t tried for long term deployment though, that’s the next step.
pet_the_bird•45m ago
Highy anecdotal: I have tried various self-hosted models using both vllm and llama.cpp. I am in a situation where I have access to large amount of memory (~320 GB).

While experimenting with quantization I found that there is a non-trivial tradeoff between quality and memory footprint. Overall my experience follows the reported pattern of "2-bit is mwah, 4-bit half decent and 6-bit required for programming. Still, although MiniMax-m2.7 is useable with the 6-bit quantizations that unsloth provides, it felt like such a breath of fresh air when I used the reference full-size model.

I find it difficult to say why. I had mostly the same setup as before (parsing had to be slightly adjusted in Zed). Aside from not experiencing the thinking loops (where minimax would get stuck generating the same sentences over and over) there is little evidence of any real improvement (although the average thinking time felt shorter).

I would recommend against very low quantizations of GLM 5.0/5.1/5.2 or Kimi 2.5/2.6. Smaller models were more reliable, and therefore more useful.

verdverm•40m ago
I have tried llama-cpp, vllm is nicer (ray, handles queueing, doesn't have the cache invalidation bug for qwen/gemma models) and unsloth has toxic employees in their discord.

I've run 2 qwen/gemma @8bit with full context window side-by-side. Right now I have 4 models on my spark (qwen36moe, embedding, reranker, qwen3-1.7B) to support my markdown kb tool.

The setup is not as capable, but still good and gets better with models/algos. To me, it's more about the freedom to tinker, freedom from token bill anxiety, and potential right to compute should the government/oligarchy decides it gets to decide who can access which models.

roger_•12m ago
How about Qwen3.7? What sort of prefill/decode rates?

Beyond All Reason (Free Total Annihilation Inspired RTS)

https://www.beyondallreason.info
144•mosiuerbarso•3h ago•59 comments

The case against geometric algebra (2024)

https://alexkritchevsky.com/2024/02/28/geometric-algebra.html
78•Hbruz0•3h ago•46 comments

Who Owns Your ATProto Identity? Hint: It's Probably Not You

https://kevinak.se/blog/who-actually-owns-your-atproto-identity-hint-its-probably-not-you
12•kevinak•28m ago•2 comments

David Ahl's Basic Computer Games Ported to C

https://github.com/proteanthread/bcg
28•theanonymousone•2h ago•10 comments

A 3D voxel game engine written in APL

https://github.com/namgyaaal/avoxelgame
97•sph•6h ago•8 comments

Google Hits 50% IPv6

https://blog.apnic.net/2026/04/28/google-hits-50-ipv6/
256•barqawiz•6h ago•255 comments

Loupe – A iOS app that raises awareness about what native apps can see

https://github.com/mysk-research/loupe
394•Cider9986•1d ago•157 comments

Two Qwen3 models on one DGX Spark: the residency math

https://www.devashish.me/p/two-qwen3-models-on-one-dgx-spark
21•devashish86•2d ago•9 comments

Running MicroVMs in Proxmox VE, the Easy Way

https://taoofmac.com/space/blog/2026/06/18/1845
127•zdw•1d ago•10 comments

Renting a sewing machine from the library

https://www.bbc.com/future/article/20260618-the-weird-and-wonderful-libraries-of-finland
277•sohkamyung•15h ago•157 comments

Zigzag Decoding with AVX-512

https://zeux.io/2026/06/17/zigzag-decoding-avx512/
100•luu•3d ago•20 comments

Slow breathing modulates brain function and risk behavior

https://www.cell.com/neuron/fulltext/S0896-6273(26)00339-9
275•croes•16h ago•78 comments

Epoll vs. io_uring in Linux

https://sibexi.co/posts/epoll-vs-io_uring/
202•Sibexico•15h ago•50 comments

A tale of two path separators

https://alexwlchan.net/2021/slashes/
42•dbaupp•4d ago•12 comments

Windows UI evolution: Clicking an unassociated file

https://movq.de/blog/postings/2026-06-20/0/POSTING-en.html
87•jandeboevrie•8h ago•56 comments

Developers don't understand CORS (2019)

https://fosterelli.co/developers-dont-understand-cors
260•toilet•13h ago•197 comments

Rare medieval bookmark exceeds expectations at auction

https://www.thehistoryblog.com/archives/76314
23•speckx•4d ago•8 comments

15-minute at-home Lyme disease tick test

https://www.bostonglobe.com/2026/06/17/business/lyme-disease-tick-test/
155•bookofjoe•3d ago•111 comments

Cosmodial Sky Atlas

https://frankforce.com/cosmodial-sky-atlas/
13•surprisetalk•4d ago•4 comments

SMPTE Makes Its Standards Freely Accessible

https://www.smpte.org/blog/smpte-makes-its-standards-freely-accessible-openingstandards-library-t...
273•zdw•21h ago•93 comments

Unauthorized alert sent to cell phones across Brazil

https://www.cnn.com/2026/06/20/americas/brazil-hackers-unauthorized-alert-latam
158•zdw•18h ago•118 comments

DOS Game "F-15 Strike Eagle II" reversing project needs DOS test pilots

https://neuviemeporte.github.io/f15-se2/2026/06/20/needyou.html
266•LowLevelMahn•23h ago•68 comments

Proportional-Integral-Derivative Controllers

https://en.wikipedia.org/wiki/PID_controller
53•dhorthy•1d ago•26 comments

UHF X11: X11 Built for VisionOS and Apple Vision Pro

https://www.lispm.net/apps/uhf-x11/
213•zdw•21h ago•48 comments

The Great Intermediary Panic

https://www.minid.net/2013/1/23/the-great-intermediary-panic
6•meerita•2d ago•2 comments

Guide to the TD4 4-bit DIY CPU

https://www.philipzucker.com/td4-4bit-cpu/
54•andrewstuart•2d ago•5 comments

Show HN: TownSquare, a tiny presence layer for websites

https://townsquare.cauenapier.com/
212•cauenapier•1d ago•118 comments

Whole cross-sectional human ultrasound tomography

https://www.nature.com/articles/s41551-026-01660-4
92•lnyan•3d ago•18 comments

Alice is impatient

https://brooker.co.za/blog/2026/06/19/waiting.html
119•birdculture•18h ago•35 comments

I was wrong about the Midjourney ultra-sound scanner

https://twitter.com/MattZirwas/status/2068365802491834541
9•MrBuddyCasino•1h ago•1 comments