frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Find the best local LLM for your hardware, ranked by benchmarks

https://github.com/Andyyyy64/whichllm
48•andyyyy64•1h ago

Comments

Jasssss•45m ago
The plan command is clever. How do you handle the VRAM estimation for models with sliding window attention vs full context? Something like Mistral at 32k context uses way less KV cache than Llama at the same context length, but from the README it looks like the estimation is based on a fixed context size. Does it account for that?
Bigsy•31m ago
Brew install is broken

It seems pretty rubbish I have to say, its recommending me loads of qwen 2.5 which are really old and I'm easy running qwen3.5 and 3.6 models on this mac at decent quants

kramit1288•27m ago
accurate memory estimation is key here. it will crash if that accurate and it cant be generic for all local llm. each local llm has different context estimates.
llagerlof•20m ago
What’s new regarding llmfit?

https://github.com/AlexsJones/llmfit

rvz•17m ago
Other than it (whichllm) being written in Python, nothing else.

I just use llmfit.

macwhisperer•19m ago
can you add in the other quants like IQ3_M?

also my personal simple rule of thumb for local ai sizing is:

max model size (GB) = ram (GB) / 1.65

pornel•11m ago
It looks nice. I've been searching for something like this recently, and was frustrated with rankings that lack latest models or don't clearly distinguish quantizations.

Showing quality loss per quantization is nice.

I'd prefer this as a website, since I'd handle running of the model with a dedicated inference server anyway.

It would be nice to see what's the maximum context length that can fit on top of the baseline.

I was surprised how much token generation speed tanks when using very long context. 30/s can drop down to 2/s. A single speed metric didn't prepare me for that.

I was also positively surprised that some models scale well with batch parallelism. I can get 4x speed improvement by running 8 requests in parallel. But this affects memory requirements, and doesn't apply to all models and inference engines. It would be nice to show that. Some sites fold it into "what's your workflow", but that's too opaque.

KV cache quantization also makes a difference for speed, VRAM usage and max usable context.

On Apple Silicon MLX-compatible model builds make a difference, so I'd like to see benchmarks reassure they're based on the fastest implementation.

Multi-token-prediction is another aspect that may substantially change speed.

sleepyeldrazi•10m ago
I love this community, I started building a simple website for this exactly a couple of hours ago and you made an even more advanced version already. Hats off to you sir.

If i ever decide to actually publish the site, is it alright if I mention you somewhere as a "If you want a more accurate estimation, check out this project:<your repo>", as i think there is value in having a simple website estimate this information for you, and give you instructions/ common flags on how to start it yourself (also a prompt crafted for you to optionally give to an llm to set it up for you), but im going off simple "choose an os, gpu/vram, here's a list of options" and not actually scanning (which is a lot more accurate).

Show HN: Find the best local LLM for your hardware, ranked by benchmarks

https://github.com/Andyyyy64/whichllm
52•andyyyy64•1h ago•9 comments

Show HN: GlycemicGPT – Open-source AI-powered diabetes management

https://github.com/GlycemicGPT/GlycemicGPT
34•jlengelbrecht•6h ago•18 comments

Show HN: OrcaSheets, local first analytics engine to process billions of rows

https://orcasheets.ai
2•ydgandhi•49m ago•0 comments

Show HN: Sanjaya – Academic paper discovery and extraction (OpenAlex/Scrapy)

https://sanjaya-six.vercel.app/
2•oug-t•1h ago•1 comments

Show HN: Domain DMARC Checker

https://dmarcdefender.io/tools/domain-check
4•c0nrad•1h ago•0 comments

Show HN: Mailenc – Test if your PGP email setup works

https://mailenc.org/
3•soeckly•1h ago•0 comments

Show HN: GridTravel – A community based travel app for users to share routes

https://www.gridtravel.app
48•knuaym9•13h ago•24 comments

Show HN: Needle: We Distilled Gemini Tool Calling into a 26M Model

https://github.com/cactus-compute/needle
736•HenryNdubuaku•2d ago•207 comments

Show HN: I solved my study problems by talking to a goose

https://professorgoose.com/
7•polaritymaking•5h ago•8 comments

Show HN: Race to the Bottom

https://race-to-the-bottom.onrender.com
55•maxwellito•20h ago•44 comments

Show HN: Running the second public ODoH relay

https://numa.rs/blog/posts/odoh-anonymous-dns-without-an-account.html
117•rdme•1d ago•41 comments

Show HN: Nibble

https://github.com/glouw/nibble
94•glouwbug•1d ago•24 comments

Show HN: JDS – a Copilot skill suite for structuring AI coding behavior

https://github.com/josipmusa/jds
7•anaq42•14h ago•0 comments

Show HN: Gigacatalyst – Extend your SaaS with an embedded AI builder

60•namanyayg•2d ago•24 comments

Show HN: Openvid – open-source cinematic screen recorder and mockup editor

https://github.com/CristianOlivera1/openvid
4•cristianolivera•10h ago•0 comments

Show HN: Latencies and BEIR – Typesense, Meilisearch, Elasticsearch, Amgix Now

https://amgix.io/blog/2026/05/14/release-now-v0.1.1/
2•kvasserman•9h ago•3 comments

Show HN: Agentic interface for mainframes and COBOL

https://www.hypercubic.ai/hopper
94•sai18•2d ago•49 comments

Show HN: Statewright – Visual state machines that make AI agents reliable

https://github.com/statewright/statewright
122•azurewraith•2d ago•54 comments

Show HN: TikTok but for scientific papers

https://andreaturchet.github.io/website/index.html
191•ciwrl•3d ago•76 comments

Show HN: Parse LLM Markdown streams incrementally on the server or client

https://github.com/nimeshnayaju/markdown-parser
3•nayajunimesh•12h ago•1 comments

Show HN: Full Stack HQ – Claude.md and Agent Stack for Claude Code

https://github.com/sabahattink/antigravity-fullstack-hq
6•sabahattink•14h ago•0 comments

Show HN: SwiftUI package for onboarding flows in iOS apps

https://github.com/Vadimkomis/onboarding
2•vadimkomis•15h ago•0 comments

Show HN: Browse 61 3D Printable Robots

https://orobot.io/
13•xanderjanz•17h ago•1 comments

Show HN: Claude-stash – an idea queue for Claude Code

https://github.com/AmirSoleimani/claude-stash
4•Amirso•16h ago•0 comments

Show HN: Visualizing Tiny LLMs from OpenAI's Parameter Golf

https://leebutterman.com/2026/05/01/visualizing-tiny-llms-in-parameter-golf.html
2•lsb•16h ago•1 comments

Show HN: A modern Music Player Daemon based on Rockbox firmware

https://github.com/tsirysndr/rockbox-zig
122•tsiry•5d ago•28 comments

Show HN: Yes We Scan: rescue old scanners with an in-browser Linux VM and WebUSB

https://yes-we-scan.app/
3•gmac•18h ago•1 comments

Show HN: OpenGravity – A zero-install, BYOK vanilla JS clone of Antigravity

https://github.com/ab-613/opengravity
105•ab613•3d ago•30 comments

Show HN: I made a Clojure-like language in Go, boots in 7ms

https://github.com/nooga/let-go
281•marcingas•5d ago•86 comments

Show HN: Halgorithem – Catching AI Hallucinations Using Trees, No AI in Pipeline

https://github.com/TangibleResearch/Halgorithem
5•amitabhi•19h ago•1 comments