frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

RTX 5080 and RTX 3090 Setup: 80 Tok/s on Qwen 3.6 27B Q8

https://imil.net/blog/posts/2026/rtx-5080-+-rtx-3090-setup-80+-tok-s-on-qwen-3.6-27b-q8/
66•iMil•6h ago

Comments

ComputerGuru•1h ago
I would have liked to see a bit more on the theory side of things, explaining optimal weight and inference splits, actual issues with existing drivers, etc instead of what’s essentially just a recipe.
verdverm•1h ago
I've been using https://spark-arena.com/leaderboard to glean this kind of information for DGX Spark, a sort of recipe book. The Nvidia forum has people talking about the things you wish to know. I see some on Discord/Reddit/et al, but less cohesive

I've switched from using the spark as a way to run one model as best it can to running several support models for the md kb I'm working on

atq2119•28m ago
Agreed. To put this in perspective, batch 1 token decode is bandwidth limited in theory.

Memory bandwidth of RTX 3090 is listed as 936GB/s. The post isn't fully clear on which model they used and how big it is, but even assuming it perfectly filled the 24GB of that GPU, 30tok/s means the achieved bandwidth is only 720GB/s. There's a bunch of room for improvement here even without MTP, and those improvements should largely stack with MTP.

deng•57m ago
I can understand the joy of running things yourself, and can also see the privacy aspect. However, I pay ~3$ per 1M/tokens for that model on Openrouter, and it's not even quantized. A refurbished 3090 and a 5080 will set you back well over 2k, not to mention the electricity to run them...
TSiege•53m ago
It’s a personal hobby project why should we care this is how someone chooses to spend their free time and money? Lots of hobbies are expensive and pointless if you think of commercially available offerings. That’s why it’s a hobby and not a small business
redfloatplane•49m ago
> I pay ~3$ per 1M/tokens for that model on Openrouter

I think the thing is, there's an unspoken "for now" at the end of that sentence and people running this locally are hedging against that "for now". Some people prefer to feel that they own the means rather than rent the means, even if the one they own is worse than the one they can rent. Especially with today's Fable news and the harsh realisation that the "for now" is dependent on very many unpredictable factors, where the one you have locally costs you capital today and a relatively predictable run-rate (made more predictable with on-prem solar for example), but should otherwise work predictably forever.

I'm not saying that you're wrong to do what you're doing, just that many people have their own lines in the sand where renting vs buying makes sense, and it doesn't only boil down to a rational (or irrational) financial decision.

jubilanti•15m ago
You're treating open weight inference providers the same as proprietary ones. They're fundamentally different business models. Proprietary companies have an incentive to subsidize actual inference and training costs in order to gain market share. The few dozen or so companies selling Qwen models by the token on openrouter are in a commodities market.

If suddenly the CCP declared a total digital embargo on Alibaba's Qwen models or even if for some reason all of mainland China (and Singapore) was completely unreachable from the rest of the world, the dozen or so companies selling Qwen by the token elsewhere in the world could continue business as usual.

avyeed_desa•57m ago
I just bought a $25 chinese 2x Oculink card and two Minis Forum DEG1, had some spare PSUs lying around, and just installed two cards on each. It works. I saw that there is also a 4x Oculink card, but i don't know it that will work, too.
atlgator•45m ago
Which "good quality PCIe 4 riser" did you buy?
iMil•19m ago
This one: https://es.aliexpress.com/item/1005010123289822.html?spm=a2g...
sieste•36m ago
That's almost exactly my setup and I'm very happy with its performance.

I noticed recently that I started to prefer my local Qwen3.6 35B A3B and pi agent over Claude Code.

Both fail at different tasks, and Qwen more so than Claude.

But the way Qwen fails is much more straightforward. In writing tasks Qwens hallucinations and bullshitting are much easier to spot because it doesn't have the sleek vocabulary and wordsmithing skills to disguise its ignorance.

In coding tasks that Qwen can't solve it often just goes into a tool calling doom loop that the pi harness can catch, whereas Claude attempts ever more convoluted and creative things just making more and more mess that takes forever to clean up.

I think part of the story is that the tasks for which I use AI are fairly simple and maybe don't need a frontier model. But I wonder if "proper" developers had similar experience?

ydj•6m ago
80tp/s with 5080 3090 combo is wild. I’ve been working with a 4090 and two Tenstorrent p150 cards, and manage only about 30 tps utilizing all three for qwen3.6 27b q8. Guess I got more optimization to do.

Would like to see the perf of their setup with and without mtp and ngram speculative decoding though, as well as parallel decode performance (once llamacpp mtp plays well with multiple slots).

redfloatplane•9m ago
I was thinking of user-side regulations as well, not only provider-side ones. I could imagine a world where a government rules that you may not use LLMs for anything, which would be much easier to get around if you have local means.
Der_Einzige•41m ago
Openrouter doesn't give you access to the models internals, i.e. complete control of logprobs, sampler stack, any PeFTs.

Openrouter fking sucks and I don't know why people here act like it's so great. Stop using it if you care about local AI and accept that the cost you'll pay for tokens is higher than you will when consumed via any cloud. That's the price for privacy, control, and better quality via inference time optimizations that otherwise aren't available.

jubilanti•8m ago
> Openrouter doesn't give you access to the models internals, i.e. complete control of logprobs, sampler stack, any PeFTs.

Openrouter gives you access to whatever the inference provider gives. They're just the middleman. Many providers give logprobs if you ask, it's in their API. And yeah, no Peft or Lora, but that's an entirely different product. And some of the inference providers do that directly.

> Openrouter fking sucks and I don't know why people here act like it's so great. Stop using it if you care about local AI

But the whole point of openrouter is that you can run models by the token and you don't have to care about local AI? Sounds like you're more upset that people aren't making the same calculation on privacy and local control vs cost and ease of use.

NicoJuicy•36m ago
Rtx 3090 24 gb set me back 390€ a year ago ( 2nd hand)
rirze•21m ago
Was it still in good condition? That price makes me wonder if it was used for crypto mining, which can wear down the hardware.
gsora•9m ago
Any sane crypto miner undervolted and underclocked their GPUs for efficiency's sake; if anything, they went through less wear than, say, regular gaming.
toyg•28m ago
Yeah but they can also be used to play games and do other stuff.
ThunderSizzle•20m ago
An R9700 is $1350 and can get 100 TPS running Qwen3.6-35B-A3B Q5 with 130k context window (with room to spare) with a bit of fine tuning llamacpp-vulkan, but llamacpp's repository instability and lack of real versioning frustrates me.

In terms of electricity, if you aren't using it, even with all the vram loaded, at most your wasting about 30 watts or so.

Prompt processing a large uncached context is annoying, which is why I forced a lower context window, but I don't know if it's any worse in performance than the cloud models I've used.

There's a niceness, to me, knowing I don't have to rent it anymore. If you rent it, the terms can change regularly.

medfield•19m ago
I use local models to explore, hosted models to refine. I somewhat envy those who can sustain local models (q8 120b+) running as a hobby.... for me, the practical path is a better SearXNG setup and knowing my routes forward.

US bans differential privacy in Census data

https://desfontain.es/blog/banning-noise.html
236•nl•2h ago•91 comments

Treating pancreatic tumours may have revealed cancer's master switch

https://economist.com/science-and-technology/2026/06/12/treating-pancreatic-tumours-may-have-reve...
68•andsoitis•2h ago•11 comments

Orthodox C++

https://bkaradzic.github.io/posts/orthodoxc++/
40•signa11•2h ago•18 comments

Every Frame Perfect

https://tonsky.me/blog/every-frame-perfect/
131•ravenical•4h ago•28 comments

AI OSS tool repo goes archived over night after raising $7.3M Seed

https://github.com/tensorzero/tensorzero
163•hek2sch•4h ago•110 comments

Introduction to the experience of rendering Arabic typography&its technical debt

https://lr0.org/blog/p/arabic/
60•bookofjoe•3h ago•8 comments

A low-carbon computing platform from your retired phones

https://research.google/blog/a-low-carbon-computing-platform-from-your-retired-phones/
161•vikas-sharma•6h ago•78 comments

Show HN: I am building a map of people who lived in the Roman Empire

https://new.roman-names.com/
63•metiscus•2d ago•14 comments

Appreciating Exif

https://brentfitzgerald.com/posts/appreciating-exif/
19•burnto•3d ago•0 comments

The state of building user interfaces in Rust

https://areweguiyet.com/#ecosystem
108•mahirsaid•2d ago•68 comments

Electric motors with no rare earths

https://www.renaultgroup.com/en/magazine/energy-and-powertrains/all-about-electric-motors-with-no...
629•bestouff•18h ago•179 comments

RTX 5080 and RTX 3090 Setup: 80 Tok/s on Qwen 3.6 27B Q8

https://imil.net/blog/posts/2026/rtx-5080-+-rtx-3090-setup-80+-tok-s-on-qwen-3.6-27b-q8/
68•iMil•6h ago•22 comments

An Interview with Intel's Kira Boyko: Xeon 6's Product Director

https://chipsandcheese.com/p/an-interview-with-intels-kira-boyko
35•lumpa•4h ago•1 comments

CRISPR tech selectively shreds cancer cells, including "undruggable" cancers

https://innovativegenomics.org/news/crispr-technique-selectively-shreds-cancer-cells/
928•gmays•1d ago•203 comments

Statement on US government directive to suspend access to Fable 5 and Mythos 5

https://www.anthropic.com/news/fable-mythos-access
2879•Dylan1312•15h ago•2103 comments

Arch Linux Now Believes Malware Incident Under Control: More Than 1,500 Packages

https://www.phoronix.com/news/Arch-Linux-AUR-More-Than-1500
179•qwertox•4h ago•92 comments

Show HN: Paca – Lightweight Jira alternative for human-AI collaboration

https://github.com/Paca-AI/paca
85•pikann22•6h ago•29 comments

Open source AI must win

https://opensourceaimustwin.com/?share=v2
1335•vednig•14h ago•415 comments

Show HN: 2 Weeks of Hallucinate – The Photo Gallery

https://hallucinate.site/gallery
54•stagas•4h ago•15 comments

How to setup a local coding agent on macOS

https://ikyle.me/blog/2026/how-to-setup-a-local-coding-agent-on-macos
438•kkm•22h ago•112 comments

Shepherd's Dog: A Game by the Most Dangerous AI Model

https://koenvangilst.nl/lab/claude-fable-shepherds-dog
139•vnglst•10h ago•109 comments

The computer science degree isn’t dead

https://spectrum.ieee.org/computer-science-degree-isnt-dead
163•jnord•3d ago•159 comments

Show HN: Putt.day a daily mini golf game

https://putt.day/
251•ellg•17h ago•98 comments

There is a shadow hanging over this Fable thing

https://12gramsofcarbon.com/p/tech-things-there-is-a-massive-shadow
416•theahura•11h ago•388 comments

Leaving Mozilla

https://blog.unitedheroes.net/5751
397•martey•10h ago•228 comments

Malware developers added nuclear and biological weapons text to to their spyware

https://twitter.com/jsrailton/status/2064661778978533571
432•marc__1•1d ago•228 comments

Twenty One Zero-Days in FFmpeg

https://depthfirst.com/research/21-zero-days-in-ffmpeg
267•redbell•18h ago•173 comments

Swift at Apple: Migrating the TrueType hinting interpreter

https://www.swift.org/blog/migrating-truetype-hinting-to-swift/
228•DASD•20h ago•109 comments

H.R. 6028 would fundamentally change the U.S. Copyright Office

https://www.eff.org/deeplinks/2026/06/congress-just-rushed-through-disastrous-copyright-office-ov...
260•Cider9986•2d ago•104 comments

Sam Bankman-Fried loses bid to appeal against fraud conviction in FTX case

https://www.theguardian.com/business/2026/jun/12/sam-bankman-fried-loses-appeal
60•pseudolus•4h ago•41 comments