frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Do not download the app, use the website

https://idiallo.com/blog/dont-download-apps
165•foxfired•1h ago•100 comments

It's time for modern CSS to kill the SPA

https://www.jonoalderson.com/conjecture/its-time-for-modern-css-to-kill-the-spa/
192•tambourine_man•2h ago•120 comments

Vanilla JavaScript support for Tailwind Plus

https://tailwindcss.com/blog/vanilla-js-support-for-tailwind-plus
187•ulrischa•5h ago•68 comments

It's a DE9, not a DB9 (but we know what you mean)

https://news.sparkfun.com/14298
314•jgrahamc•10h ago•209 comments

Experimental surgery performed by AI-driven surgical robot

https://arstechnica.com/science/2025/07/experimental-surgery-performed-by-ai-driven-surgical-robot/
49•horseradish•3h ago•52 comments

Why MIT switched from Scheme to Python (2009)

https://www.wisdomandwonder.com/link/2110/why-mit-switched-from-scheme-to-python
155•borski•7h ago•157 comments

Efficient Computer's Electron E1 CPU – 100x more efficient than Arm?

https://morethanmoore.substack.com/p/efficient-computers-electron-e1-cpu
136•rpiguy•7h ago•46 comments

Animated Cursors

https://tattoy.sh/news/animated-cursors/
107•speckx•6h ago•27 comments

Developing our position on AI

https://www.recurse.com/blog/191-developing-our-position-on-ai
143•jakelazaroff•2d ago•39 comments

Ask HN: How many of you are working in tech without a STEM degree?

23•zebproj•2d ago•26 comments

Never write your own date parsing library

https://www.zachleat.com/web/adventures-in-date-parsing/
122•ulrischa•6h ago•145 comments

Steam, Itch.io are pulling ‘porn’ games. Critics say it's a slippery slope

https://www.wired.com/story/steam-itchio-are-pulling-porn-games-censorship/
340•6d6b73•7h ago•471 comments

CO2 Battery

https://energydome.com/co2-battery/
99•xnx•7h ago•94 comments

Running PostmarketOS on Android Termux proot without a custom ROM (2024)

https://ivonblog.com/en-us/posts/postmarketos-in-termux-proot/
21•user070223•2d ago•1 comments

Windsurf employee #2: I was given a payout of only 1% what my shares where worth

https://twitter.com/premqnair/status/1948420769945682413
318•rfurmani•1d ago•184 comments

Programming vehicles in games

https://wassimulator.com/blog/programming/programming_vehicles_in_games.html
229•Bogdanp•9h ago•52 comments

The future is not self-hosted

https://www.drewlyton.com/story/the-future-is-not-self-hosted/
207•drew_lytle•12h ago•233 comments

Internet Archive is now a federal depository library

https://www.kqed.org/news/12049420/sf-based-internet-archive-is-now-a-federal-depository-library-what-does-that-mean
200•XnoiVeX•7h ago•39 comments

Show HN: Price Per Token – LLM API Pricing Data

https://pricepertoken.com/
277•alexellman•11h ago•116 comments

Women dating safety app 'Tea' breached, users' IDs posted to 4chan

https://www.404media.co/women-dating-safety-app-tea-breached-users-ids-posted-to-4chan/
285•gloxkiqcza•8h ago•418 comments

Why is there a date of 1968 in the Intel Chipset Device Software Utility?

https://www.intel.com/content/www/us/en/support/articles/000095169/processors.html
24•vegadw•2d ago•8 comments

Who has the fastest F1 website (2021)

https://jakearchibald.com/2021/f1-perf-part-3/
174•tosh•10h ago•57 comments

Trucking's uneasy relationship with new tech

https://www.bbc.com/news/articles/c5yeyn4gl80o
24•fidotron•4d ago•16 comments

Researchers value null results, but struggle to publish them

https://www.nature.com/articles/d41586-025-02312-4
70•Bluestein•2d ago•31 comments

Show HN: Apple Health MCP Server

https://github.com/neiltron/apple-health-mcp
142•_neil•2d ago•31 comments

Google in 1999: Search engines escape the portal matrix

https://cybercultural.com/p/google-1999/
24•speckx•3h ago•28 comments

SRAM Has No Chill: Exploiting Power Domain Separation to Steal On-Chip Secrets

https://cacm.acm.org/research-highlights/sram-has-no-chill-exploiting-power-domain-separation-to-steal-on-chip-secrets/
5•zdw•1h ago•0 comments

Implementing a functional language with graph reduction (2021)

https://thma.github.io/posts/2021-12-27-Implementing-a-functional-language-with-Graph-Reduction.html
41•Bogdanp•6h ago•2 comments

Dwl: Dwm for Wayland

https://codeberg.org/dwl/dwl
91•theycallhermax•10h ago•67 comments

Celebrating 20 Years of MDN

https://developer.mozilla.org/en-US/blog/mdn-turns-20/
360•soheilpro•22h ago•52 comments
Open in hackernews

A GPU Calculator That Helps Calculate What GPU to Use

https://calculator.inference.ai/
103•chlobunnee•1d ago

Comments

chlobunnee•1d ago
I built a calculator to help researchers and engineers pick the right GPUs for training and inference workloads!

It helps compare GPU options by taking in simple parameters (# of transformer layers, token size, etc) and letting users know which GPUs are compatible + their efficiency for training vs inferencing.

The idea came from talking with ML researchers frustrated by slow cluster queues or wasting money on overkill GPUs.

I'd love feedback on what you feel is missing/confusing!

Some things I'm thinking about incorporating next are >Allowing users to directly compare 2 GPUs and their specs >Allowing users to see whether a fraction of the GPU can complete their workload

I would really appreciate your thoughts/feedback! Thanks!

quotemstr•1d ago
No sharding? At all?
LorenDB•1d ago
Where's AMD support? I have a 9070 XT and would love to see it listed on here.
funfunfunction•1d ago
This is a cheap marketing ploy for a GPU reseller with billboards on highway 101 into SF.
ChadNauseam•21h ago
Hate those ads. "Inference isn't just a buzzword". Who thought it was? (No comment on whether the linked post is a useful tool, I haven't played with it enough to know)
zargon•1d ago
The best VRAM calculator I have found is https://apxml.com/tools/vram-calculator. It is much more thorough than this one. For example, it understands different models' attention schemes for correct KV cache size calculation, and supports quantization of both the model and the KV cache. Also, fine-tuning. It has its own limitations, such as only supporting specific models. In practice though, the generic calculators are not very useful because model architectures vary (mainly the KV cache) and end up being way off. (Not sure whether or not it would be better to discuss it separately, but I submitted it at https://news.ycombinator.com/item?id=44677409)
zeroq•22h ago
This one is indeed much better and it instantly answers my immediate feedback I wanted to leave for the one originally posted, which is - instead of calculating an artificial scenario I would like to state what can I run on the hardware I actually have at hand. Thanks!
jwrallie•15h ago
Nice! I could have saved so much time downloading models to do trial end error with this.
oktoberpaard•12h ago
It gives weird results for me. I’m using Qwen3-32B with 32K context length at Q4_K_M, with 8 bit KV cache fully offloaded to 24GB VRAM. According to this calculator this should be impossible by a large margin, yet it’s working for me.

Edit: this might be because I’ve got flash attention enabled in Ollama.

yepyip•7h ago
Somehow you have to login now, to use it. It wasn't like this a few weeks ago...
mdaniel•4h ago
That is not my experience, maybe your IP is flagged as hammering their site?
amanzi•1d ago
I would have liked to see the RTX 5060 Ti with 16GB mentioned. I can't tell if it's omitted because it won't work, or if it's excluded for some other reason?
amatecha•23h ago
Yeah, weird miss, but maybe just because it came out more recently. It can be used for ~anything a 5070 could be used for, no? Maybe slower, but still.
snvzz•1d ago
Rather than GPU calculator, this is an NVIDIA calculator.
nodesocket•23h ago
In case you’ve been living in a cave, Nvidia is the defacto standard for LLM compute.
jakogut•20h ago
Llama.cpp supports Vulkan, which is supported by all GPU vendors that care about standards and interoperability.

The default should be open and portable APIs, not needlessly furthering a hegemony that is detrimental to us all.

timothyduong•1d ago
Where's 3090? Or should that fall in the 4090 (24GB VRAM) category?
jjmarr•21h ago
AMD support?
mdaniel•21h ago
> 0 Model Available

Who in the world is expected to populate 11 select/text fields with their favorite model data points they just happen to have lying around, only to see an absolutely meaningless "295% Inference" outcome

What a dumpster

kouteiheika•15h ago
The training memory breakdown is wildly inaccurate.

- No one trains big models in FP32 anymore.

- Gradients can also often be in BF16, and they don't actually have to be stored if you're not using gradient accumulation or if you're accumulating them directly in the optimizer's state.

- 32-bit Adam is silly; if you don't have infinite VRAM there's no reason why you wouldn't want to use 8-bit Adam (or you can go even lower with quantized Muon)

- Activations? They take up memory too, but are not mentioned.

It shows that to train a 3.77B parameter model I need 62GB of VRAM; just to give you some perspective for how overestimated this is: a few weeks back I was training (full fine-tuning, not LoRA) a 14B parameter model on 24GB of VRAM using every trick in the book to lower VRAM usage (to be fair, not all of those tricks are available in publicly available training harnesses, but the point still stands that even with an off-the-shelf training harness you can do a lot better than what this calculator suggests).

fooker•11h ago
Fine tuning and training are very different beasts.
kouteiheika•9h ago
No they're not? The process is essentially exactly the same, just with a much lower total FLOPs budget, since if you're not training from scratch then you don't need to train for as long. I can use *exactly* the same code that I used to fine-tune a model to train a new model from scratch; literally the only difference is whether I initialize the initial weights randomly or with an existing model, a couple of hyperparameters (e.g. for training from scratch you want to start at a higher LR), and training for longer.
fooker•7h ago
No, if you try to train an LLM like you're suggesting:

- you'll get something similar to gpt2.

- To approach the scale of modern LLMs, you'll need about 10x more than all the GPUs in the world.

It's a neat abstraction to consider these the same, but do you think Meta is paying 100M for writing a 15 line script?

kouteiheika•5h ago
I still don't understand what exactly you are disagreeing with.

Meta is paying the big bucks because to train a big LLM in a reasonable time you need *scale*. But the process itself is the same as full fine-tuning, just scaled up across many GPUs. If I would be patient enough to wait a few years/decades for my single GPU to chug through 15 trillion tokens then I could too train a Llama from scratch (assuming I feed it the same training data).

ethan_smith•10h ago
Great points about training optimizations. For inference, similar dramatic memory reductions are possible through quantization (INT4/INT8) which can reduce VRAM needs by 2-8x compared to FP16, allowing much larger models on consumer GPUs.
daft_pink•14h ago
It would be really nice if you could import the standard models so we could see what kind of gpu we would need for popular models in the news and on hugging face
amelius•13h ago
I selected LLama 3 70B, and then it said all the GPUs are insufficient for training :(
nottorp•12h ago
What GPU to use for what? Witcher 4? Death Stranding?
alkonaut•11h ago
Save you a click: it’s about AI.
amstan•2h ago
You're missing any AMD stuff, I can run a quantized deepseek r1 671B on 4 framework desktops, yet it's "insufficient" for 10 Nvidia gpus.