frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

Inference cost at scale with napkin math

https://injuly.in/blog/napkin-inference-cost/index.html
44•gmays•4d ago

Comments

smalltorch•2h ago
>This largely depends on whether you own or rent your hardware. At $40,000 per B200, your lifetime cost per user is 40_000/num_users. In the 100% duty cycle case (worst for cost), that's 6k$ per user. Realistically, serving 300 users per GPU you'll spend a lifetime cost of about $133 per user, plus the datacenter/upkeep bill. If you rent the GPU, the cost is more straightforward. At an hourly rate of $43, your hourly cost per user is 4/num_users. For num_users=300 you get an hourly rate of about $0.013 per user, or $9.36 per month.

This leads me to believe you can buy a GPU but leave it at a data center?

Do people do this? I don't understand. Or are you equating upkeep bill to electricity on premises?

__s•2h ago
You can, people do. https://www.linkedin.com/posts/activity-7409593739138060288-...
smalltorch•1h ago
So what's the cost separating them from placing this box at their premise?

Network throughout?

namibj•1h ago
Plus power and cooling.
BadBadJellyBean•17m ago
Plus space, manpower and security.
BadBadJellyBean•43m ago
I'd like to see a bit of the running costs inside the napkin math. Power, cooling, maintenance, rent, etc. are probably significant factors as well.
JBAnderson5•37m ago
> Realistically, serving 300 users per GPU you'll spend a lifetime cost of about $133 per user, plus the datacenter/upkeep bill.

What is the operational cost and when does it become more expensive than the upfront capex?

The B200 tops out at 1000W and idles around 140W. It averages around 600W. https://www.lightly.ai/blog/nvidia-b200-vs-h100 U.S. average electricity cost is $.14 per kWh in March. https://www.eia.gov/electricity/monthly/epm_table_grapher.ph...

600/1000 *.14 =$0.084 per hour $2.01 per day $60.30 per month With 300 users, $.20 per user per month. Seems fairly cheap for the electricity.

Does anyone know how to estimate colo/data center rent costs? Where did I screw up my estimates?

BadBadJellyBean•19m ago
I wonder what the power costs are when you put jet turbines in front of your DC to power it.
breput•32m ago
> We'll assume a 32B dense model, as they've have gotten quite good for production use and a B200 can comfortably serve them. This could be a Gemma, Qwen, DeepSeek, whatever.

That seems like a very consequential point to include halfway through the post. They aren't wrong that Qwen 3.6 26B or Gemma 4 31B are quite good, depending on the use case, but if we're doing napkin math, I'd want some more headroom in the assumptions.

They really ought to have Qwen parameterize their post's calculations and add sliders so a reader could play around with the values.

Edit: And since they especially mentioned DeekSeek (or whatever), as far as I know, none of their current generation of models is a dense model, and even the smallest of the mixture of experts (MoE) models is 284B parameters (13B activated). That will completely incinerate their napkin.

Pre-2022 Books

https://notes.lorenzogravina.com/musings/pre-2022-books
94•trms•1h ago•45 comments

Not just books: renting a sewing machine from the library can improve democracy

https://www.bbc.com/future/article/20260618-the-weird-and-wonderful-libraries-of-finland
38•sohkamyung•1h ago•14 comments

Epoll vs. Io_uring in Linux

https://sibexi.co/posts/epoll-vs-io_uring/
19•Sibexico•47m ago•1 comments

SMPTE Makes Its Standards Freely Accessible

https://www.smpte.org/blog/smpte-makes-its-standards-freely-accessible-openingstandards-library-t...
212•zdw•6h ago•58 comments

Alice is impatient

https://brooker.co.za/blog/2026/06/19/waiting.html
36•birdculture•3h ago•6 comments

UHF X11: X11 Built for VisionOS and Apple Vision Pro

https://www.lispm.net/apps/uhf-x11/
147•zdw•6h ago•20 comments

PostgresBench: A Reproducible Benchmark for Postgres Services

https://clickhouse.com/blog/postgresbench
70•saisrirampur•4h ago•18 comments

Show HN: TownSquare, a tiny presence layer for websites

https://townsquare.cauenapier.com/
11•cauenapier•11h ago•2 comments

Semiconductor Lifeline Keeps Fighter Jets in the Air

https://spectrum.ieee.org/phoenix-semiconductors-legacychips-oems
20•rbanffy•4d ago•2 comments

DOS Game "F-15 Strike Eagle II" reversing project needs DOS test pilots

https://neuviemeporte.github.io/f15-se2/2026/06/20/needyou.html
186•LowLevelMahn•8h ago•55 comments

Slow breathing modulates brain function and risk behavior

https://www.cell.com/neuron/fulltext/S0896-6273(26)00339-9
8•croes•1h ago•0 comments

Inference cost at scale with napkin math

https://injuly.in/blog/napkin-inference-cost/index.html
46•gmays•4d ago•10 comments

CSSQuake

https://cssquake.com/
440•msalsas•13h ago•92 comments

Turns Out, There Is a Cabal of Elite Crazies Trying to Control the World

https://www.esquire.com/news-politics/politics/a71619211/peter-thiel-dialog-club-wired-report/
102•throwaway81523•1h ago•33 comments

Unauthorized alert sent to cell phones across Brazil

https://www.cnn.com/2026/06/20/americas/brazil-hackers-unauthorized-alert-latam
71•zdw•3h ago•46 comments

Show HN: StartupWiki – A Free Alternative to Crunchbase

https://startupwiki.tech/
141•shpran•7h ago•45 comments

The Wholesale Plagiarism of Obscure Sorrows

https://waxy.org/2026/06/the-wholesale-plagiarism-of-obscure-sorrows/
301•ridesisapis•5h ago•129 comments

Show HN: Make PDFs look scanned (CLI or in the browser via WASM)

https://github.com/overflowy/make-look-scanned
71•overflowy•5h ago•37 comments

The rise of South Korea’s weapons business

https://www.politico.com/news/magazine/2026/06/20/south-korea-weapons-dealer-trump-00959559
98•JumpCrisscross•12h ago•35 comments

Whole cross-sectional human ultrasound tomography

https://www.nature.com/articles/s41551-026-01660-4
4•lnyan•2d ago•0 comments

Supermarket giant Tesco sues VMware for breach of contract

https://www.theregister.com/software/2025/09/03/supermarket-giant-tesco-sues-vmware-for-breach-of...
62•wglb•2h ago•15 comments

Loupe – A iOS app that raises awareness about what native apps can see

https://github.com/mysk-research/loupe
10•Cider9986•11h ago•2 comments

Temporary Cloudflare accounts for AI agents

https://blog.cloudflare.com/temporary-accounts/
153•farhadhf•12h ago•89 comments

Bun has an open PR adding shared-memory threads to JavaScriptCore

https://github.com/oven-sh/WebKit/pull/249
103•gr4vityWall•6h ago•177 comments

Show HN: We post-trained a model that pen tests instead of refusing

https://www.argusred.com/cli
66•dk189•10h ago•29 comments

A Love Story

https://pudding.cool/2026/06/love-story/
37•simonebrunozzi•3h ago•5 comments

Why has the pointe shoe been so resistant to change?

https://dancemagazine.com/pointe-shoe-innovation/
43•onemind•22h ago•44 comments

Show HN: Tiny – An interpeted dynamic langauge with inline Go native functions

https://github.com/confh/Tiny
29•confis•5h ago•5 comments

Show HN: My Windows XP portfolio with working Game Boy and iPod

https://mitchivin.com/
44•mitchivin•4h ago•19 comments

Linux Eliminates the Strncpy API After Six Years of Work, 360 Patches

https://www.phoronix.com/news/Linux-7.2-Drops-strncpy
69•simonpure•2h ago•29 comments