frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Find the best local LLM for your hardware, ranked by benchmarks

https://github.com/Andyyyy64/whichllm
102•andyyyy64•2h ago

Comments

Jasssss•1h ago
The plan command is clever. How do you handle the VRAM estimation for models with sliding window attention vs full context? Something like Mistral at 32k context uses way less KV cache than Llama at the same context length, but from the README it looks like the estimation is based on a fixed context size. Does it account for that?
Bigsy•1h ago
Brew install is broken

It seems pretty rubbish I have to say, its recommending me loads of qwen 2.5 which are really old and I'm easy running qwen3.5 and 3.6 models on this mac at decent quants

vachina•4m ago
AI slop quality software for ya
kramit1288•1h ago
accurate memory estimation is key here. it will crash if that accurate and it cant be generic for all local llm. each local llm has different context estimates.
llagerlof•1h ago
What’s new regarding llmfit?

https://github.com/AlexsJones/llmfit

rvz•58m ago
Other than it (whichllm) being written in Python, nothing else.

I just use llmfit.

macwhisperer•1h ago
can you add in the other quants like IQ3_M?

also my personal simple rule of thumb for local ai sizing is:

max model size (GB) = ram (GB) / 1.65

pornel•52m ago
It looks nice. I've been searching for something like this recently, and was frustrated with rankings that lack latest models or don't clearly distinguish quantizations.

Showing quality loss per quantization is nice.

I'd prefer this as a website, since I'd handle running of the model with a dedicated inference server anyway.

It would be nice to see what's the maximum context length that can fit on top of the baseline.

I was surprised how much token generation speed tanks when using very long context. 30/s can drop down to 2/s. A single speed metric didn't prepare me for that.

I was also positively surprised that some models scale well with batch parallelism. I can get 4x speed improvement by running 8 requests in parallel. But this affects memory requirements, and doesn't apply to all models and inference engines. It would be nice to show that. Some sites fold it into "what's your workflow", but that's too opaque.

KV cache quantization also makes a difference for speed, VRAM usage and max usable context.

On Apple Silicon MLX-compatible model builds make a difference, so I'd like to see benchmarks reassure they're based on the fastest implementation.

Multi-token-prediction is another aspect that may substantially change speed.

sleepyeldrazi•50m ago
I love this community, I started building a simple website for this exactly a couple of hours ago and you made an even more advanced version already. Hats off to you sir.

If i ever decide to actually publish the site, is it alright if I mention you somewhere as a "If you want a more accurate estimation, check out this project:<your repo>", as i think there is value in having a simple website estimate this information for you, and give you instructions/ common flags on how to start it yourself (also a prompt crafted for you to optionally give to an llm to set it up for you), but im going off simple "choose an os, gpu/vram, here's a list of options" and not actually scanning (which is a lot more accurate).

jordiburgos•41m ago
This is very helpful too: https://www.canirun.ai/
pbronez•37m ago
Cool, but it looks like it doesn’t actually test anything on your machine? It does hardware detection and then some lookups. Maybe I missed it but I really want a tool like this to actually run a model on my machine to get the speed numbers.

I’ve been using RapidMLX for this. The integrated speed tests matter because the quality of the backend is a moving target and the quantization / MLX format conversion also matter. It’s not enough to say “oh use this model family with X parameters” you have to add the architecture specific quantization too.

https://github.com/raullenchai/Rapid-MLX

cyanydeez•11m ago
This doesn't correclty detect the unified memory architecture for

GPU 0: STRXLGEN — 8.0 GB (ROCm 6.19.8-200.fc43.x86_64) — BW: N/A CPU: AMD RYZEN AI MAX+ 395 w/ Radeon 8060S — 16 cores (AVX2, AVX-512)

The 8GB is the reserved memory, but it's not the total available memory to the GPU.

Linux sets the unified memory like this on linux: https://www.jeffgeerling.com/blog/2025/increasing-vram-alloc...

Don't feel bad though, nvtop doesn't do it correctly either.

O(x)Caml in Space

https://gazagnaire.org/blog/2026-05-14-borealis.html
38•yminsky•58m ago•1 comments

Explore Wikipedia Like a Windows XP Desktop

https://explorer.samismith.com/
179•smusamashah•3h ago•42 comments

Show HN: Find the best local LLM for your hardware, ranked by benchmarks

https://github.com/Andyyyy64/whichllm
107•andyyyy64•2h ago•12 comments

Steve Jobs Next Computer: His Forgotten Exile Years

https://spectrum.ieee.org/steve-jobs-next-computer
23•rbanffy•1h ago•7 comments

Removing the modem and GPS from my 2024 RAV4 hybrid

https://arkadiyt.com/2026/05/13/removing-the-modem-and-gps-from-my-rav4/
913•arkadiyt•18h ago•471 comments

Show HN: GlycemicGPT – Open-source AI-powered diabetes management

https://github.com/GlycemicGPT/GlycemicGPT
44•jlengelbrecht•7h ago•28 comments

UK government replaces Palantir software with internally-built refugee system

https://www.bbc.com/news/articles/c2l2j1lxdk5o
290•cdrnsf•13h ago•98 comments

A few words on DS4

https://antirez.com/news/165
340•caust1c•13h ago•142 comments

Building ML framework with Rust and Category Theory

https://hghalebi.github.io/category_theory_transformer_rs/
50•adamnemecek•19h ago•13 comments

UK sovereign LLM inference

https://relax.ai/docs
81•benjamintnorris•2h ago•74 comments

Details of the Daring Airdrop at Tristan Da Cunha

https://www.tristandc.com/government/news-2026-05-11-airdrop.php
166•kspacewalk2•7h ago•51 comments

RTX 5090 and M4 MacBook Air: Can It Game?

https://scottjg.com/posts/2026-05-05-egpu-mac-gaming/
614•allenleee•20h ago•145 comments

First public macOS kernel memory corruption exploit on Apple M5

https://blog.calif.io/p/first-public-kernel-memory-corruption
382•quadrige•17h ago•91 comments

Gyroflow: Video stabilization using gyroscope data

https://github.com/gyroflow/gyroflow
105•nateb2022•2d ago•18 comments

Where's Ed: Anthropic Told Court $5B but Public $19B

https://www.flyingpenguin.com/wheres-ed-anthropic-told-court-5-billion-but-public-19-billion/
15•jorisw•3h ago•9 comments

New Nginx Exploit

https://github.com/DepthFirstDisclosures/Nginx-Rift
395•hetsaraiya•18h ago•88 comments

Codex is now in the ChatGPT mobile app

https://openai.com/index/work-with-codex-from-anywhere/
357•mikeevans•15h ago•176 comments

Mullvad exit IPs are surprisingly identifying

https://tmctmt.com/posts/mullvad-exit-ips-as-a-fingerprinting-vector/
425•RGBCube•9h ago•251 comments

Solar-based sleep patterns compared to modern norms

https://dylan.gr/1775146616
82•James72689•7h ago•70 comments

Tesla Wall Connector bootloader bypasses the firmware downgrade ratchet

https://www.synacktiv.com/en/publications/exploiting-the-tesla-wall-connector-from-its-charge-por...
105•p_stuart82•15h ago•46 comments

Claude for Legal

https://github.com/anthropics/claude-for-legal
108•Einenlum•14h ago•97 comments

HDD Firmware Hacking

https://icode4.coffee/?p=1465
196•jsploit•19h ago•26 comments

Access to frontier AI will soon be limited by economic and security constraints

https://writing.antonleicht.me/p/cut-off
175•thoughtpeddler•10h ago•164 comments

RISC-V Router

https://router.start9.com/
126•janandonly•15h ago•74 comments

Porting 3D Movie Maker to Linux

https://benstoneonline.com/posts/porting-3d-movie-maker-to-linux/
136•speckx•3d ago•28 comments

What's in a GGUF, besides the weights – and what's still missing?

https://nobodywho.ooo/posts/whats-in-a-gguf/
157•bashbjorn•18h ago•46 comments

New arXiv policy: 1-year ban for hallucinated references

https://twitter.com/tdietterich/status/2055000956144935055
531•gjuggler•15h ago•185 comments

Geography is four-dimensional

https://sive.rs/4d
24•galfarragem•2h ago•10 comments

How Claude Code works in large codebases

https://claude.com/blog/how-claude-code-works-in-large-codebases-best-practices-and-where-to-start
187•shenli3514•7h ago•131 comments

Infracost (YC W21) Is Hiring Sr Dev Advocate to make agents cloud cost-aware

https://www.ycombinator.com/companies/infracost/jobs/NzwUQ7c-senior-developer-advocate
1•akh•14h ago