frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Meta Platforms: Lobbying, dark money, and the App Store Accountability Act

https://github.com/upper-up/meta-lobbying-and-other-findings
995•shaicoleman•7h ago•433 comments

Show HN: Channel Surfer – Watch YouTube like it’s cable TV

https://channelsurfer.tv
69•kilroy123•2d ago•48 comments

Can I run AI locally?

https://www.canirun.ai/
250•ricardbejarano•5h ago•65 comments

TUI Studio – visual terminal UI design tool

https://tui.studio/
400•mipselaer•7h ago•228 comments

Launch HN: Captain (YC W26) – Automated RAG for Files

https://www.runcaptain.com/
24•CMLewis•2h ago•7 comments

Launch HN: Spine Swarm (YC S23) – AI agents that collaborate on a visual canvas

https://www.getspine.ai/
62•a24venka•4h ago•51 comments

Bucketsquatting is (finally) dead

https://onecloudplease.com/blog/bucketsquatting-is-finally-dead
258•boyter•9h ago•136 comments

The Accidental Room (2018)

https://99percentinvisible.org/episode/the-accidental-room/
3•blewboarwastake•8m ago•0 comments

Willingness to look stupid

https://sharif.io/looking-stupid
657•Samin100•4d ago•226 comments

The Wyden Siren Goes Off Again: We'll Be "Stunned" by NSA Under Section 702

https://www.techdirt.com/2026/03/12/the-wyden-siren-goes-off-again-well-be-stunned-by-what-the-ns...
71•cf100clunk•1h ago•16 comments

Lost Doctor Who Episodes Found

https://www.bbc.co.uk/news/articles/c4g7kwq1k11o
88•edent•12h ago•23 comments

E2E encrypted messaging on Instagram will no longer be supported after 8 May

https://help.instagram.com/491565145294150
261•mindracer•4h ago•146 comments

Okmain: How to pick an OK main colour of an image

https://dgroshev.com/blog/okmain/
177•dgroshev•4d ago•41 comments

The Mrs Fractal: Mirror, Rotate, Scale (2025)

https://www.4rknova.com//blog/2025/06/22/mrs-fractal
27•ibobev•4d ago•3 comments

Gvisor on Raspbian

https://nubificus.co.uk/blog/gvisor-rpi5/
43•_ananos_•7h ago•8 comments

The Bovadium Fragments: Together with The Origin of Bovadium

https://kirkcenter.org/reviews/monster-is-the-machine/
36•freediver•4d ago•13 comments

Executing programs inside transformers with exponentially faster inference

https://www.percepta.ai/blog/can-llms-be-computers
249•u1hcw9nx•1d ago•92 comments

Why the militaries are scrambling to create their own Starlink

https://www.newscientist.com/article/2517766-why-the-worlds-militaries-are-scrambling-to-create-t...
14•mooreds•30m ago•1 comments

Show HN: What was the world listening to? Music charts, 20 countries (1940–2025)

https://88mph.fm/
81•matteocantiello•3d ago•36 comments

Dijkstra's Crisis: The End of Algol and Beginning of Software Engineering (2010) [pdf]

https://www.tomandmaria.com/Tom/Writing/DijkstrasCrisis_LeidenDRAFT.pdf
49•ipnon•4d ago•13 comments

Revealed: Face of 75,000-year-old female Neanderthal from cave

https://www.cam.ac.uk/stories/shanidar-z-face-revealed
18•thunderbong•55m ago•5 comments

“This is not the computer for you”

https://samhenri.gold/blog/20260312-this-is-not-the-computer-for-you/
851•MBCook•16h ago•318 comments

Run NanoClaw in Docker Sandboxes

https://nanoclaw.dev/blog/nanoclaw-docker-sandboxes/
106•outofdistro•4h ago•47 comments

What we learned from a 22-Day storage bug (and how we fixed it)

https://www.mux.com/blog/22-day-storage-bug
34•mmcclure•4d ago•5 comments

OVH forgot they donated documentation hosting to Pandas

https://github.com/pandas-dev/pandas/issues/64584
109•nwalters512•1h ago•33 comments

NASA targets Artemis II crewed moon mission for April 1 launch

https://www.npr.org/2026/03/12/nx-s1-5746128/nasa-artemis-ii-april-launch
41•Brajeshwar•2h ago•25 comments

ATMs didn’t kill bank teller jobs, but the iPhone did

https://davidoks.blog/p/why-the-atm-didnt-kill-bank-teller
500•colinprince•1d ago•525 comments

Removing recursion via explicit callstack simulation

https://jnkr.tech/blog/removing-recursion
4•todsacerdoti•4d ago•2 comments

Ceno, browse the web without internet access

https://ceno.app/en/index.html?
104•mohsen1•11h ago•29 comments

IMG_0416 (2024)

https://ben-mini.com/2024/img-0416
179•TigerUniversity•4d ago•42 comments
Open in hackernews

Can I run AI locally?

https://www.canirun.ai/
242•ricardbejarano•5h ago

Comments

John23832•2h ago
RTX Pro 6000 is a glaring omission.
schaefer•1h ago
No Nvidia Spark workstation is another omission.
embedding-shape•1h ago
Yeah, that's weird, seems it has later models, and earlier, but specifically not Pro 6000? Also, based on my experience, the given numbers seems to be at least one magnitude off, which seems like a lot, when I use the approx values for a Pro 6000 (96GB VRAM + 1792 GB/s)
sxates•1h ago
Cool thing!

A couple suggestions:

1. I have an M3 Ultra with 256GB of memory, but the options list only goes up to 192GB. The M3 Ultra supports up to 512GB. 2. It'd be great if I could flip this around and choose a model, and then see the performance for all the different processors. Would help making buying decisions!

GrayShade•1h ago
This feels a bit pessimistic. Qwen 3.5 35B-A3B runs at 38 t/s tg with llama.cpp (mmap enabled) on my Radeon 6800 XT.
Aurornis•44m ago
At what quantization and with what size context window?
phelm•1h ago
This is awesome, it would be great to cross reference some intelligence benchmarks so that I can understand the trade off between RAM consumption, token rate and how good the model is
S4phyre•1h ago
Oh how cool. Always wanted to have a tool like this.
adithyassekhar•1h ago
This just reminded me of this https://www.systemrequirementslab.com/cyri.

Not sure if it still works.

twampss•1h ago
Is this just llmfit but a web version of it?

https://github.com/AlexsJones/llmfit

deanc•1h ago
Yes. But llmfit is far more useful as it detects your system resources.
dgrin91•1h ago
Honestly I was surprised about this. It accurately got my GPU and specs without asking for any permissions. I didnt realize I was exposing this info.
dekhn•1h ago
How could it not? That information is always available to userspace.
bityard•20m ago
"Available to userspace" is a much different thing than "available to every website that wants it, even in private mode".

I too was a little surprised by this. My browser (Vivladi) makes a big deal about how privacy-conscious they are, but apparently browser fingerprinting is not on their radar.

swiftcoder•15m ago
It's pretty hard to avoid GPU fingerprinting if you have webgl/webgpu enabled
dekhn•15m ago
We switched to talking about llmfit in this subthread, it runs as native code.
rithdmc•47m ago
Do you mean the OPs website? Mine's way off.

> Estimates based on browser APIs. Actual specs may vary

spudlyo•11m ago
I run LibreWolf, which is configured to ask me before a site can use WebGL, which is commonly used for fingerprinting. I got the popup on this site, so I assume that's how they're doing it.
mrdependable•1h ago
This is great, I've been trying to figure this stuff out recently.

One thing I do wonder is what sort of solutions there are for running your own model, but using it from a different machine. I don't necessarily want to run the model on the machine I'm also working from.

cortesoft•1h ago
Ollama runs a web server that you use to interact with the models: https://docs.ollama.com/quickstart

You can also use the kubernetes operator to run them on a cluster: https://ollama-operator.ayaka.io/pages/en/

rebolek•31m ago
ssh?
g_br_l•1h ago
could you add raspi to the list to see which ridiculously small models it can run?
vova_hn2•1h ago
It says "RAM - unknown", but doesn't give me an option to specify how much RAM I have. Why?
charcircuit•1h ago
On mobile it does not show the name of the model in favor of the other stats.
debatem1•1h ago
For me the "can run" filter says "S/A/B" but lists S, A, B, and C and the "tight fit" filter says "C/D" but lists F.

Just FYI.

metalliqaz•1h ago
Hugging Face can already do this for you (with much more up-to-date list of available models). Also LM Studio. However they don't attempt to estimate tok/sec, so that's a cool feature. However I don't really trust those numbers that much because it is not incorporating information about the CPU, etc. True GPU offload isn't often possible on consumer PC hardware. Also there are different quants available that make a big difference.
havaloc•1h ago
Missing the A18 Neo! :)
arjie•1h ago
Cool website. The one that I'd really like to see there is the RTX 6000 Pro Blackwell 96 GB, though.
ge96•1h ago
Raspberry pi? Say 4B with 4GB of ram.

I also want to run vision like Yocto and basic LLM with TTS/STT

boutell•33m ago
I've been trying to get speech to text to work with a reasonable vocabulary on pis for a while. It's tough. All the modern models just need more GPU than is available
ge96•28m ago
Whispr?

For wakewords I have used pico rhino voice

I want to use these I2S breakout mics

meatmanek•6m ago
For ASR/STT on a budget, you want https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3 - it works great on CPU.

I haven't tried on a raspberry pi, but on Intel it uses a little less than 1s of CPU time per second of audio. Using https://github.com/NVIDIA-NeMo/NeMo/blob/main/examples/asr/a... for chunked streaming inference, it takes 6 cores to process audio ~5x faster than realtime. I expect with all cores on a Pi 4 or 5, you'd probably be able to at least keep up with realtime.

(Batch inference, where you give it the whole audio file up front, is slightly more efficient, since chunked streaming inference is basically running batch inference on overlapping windows of audio.)

EDIT: there are also the multitalker-parakeet-streaming-0.6b-v1 and nemotron-speech-streaming-en-0.6b models, which have similar resource requirements but are built for true streaming inference instead of chunked inference. In my tests, these are slightly less accurate. In particular, they seem to completely omit any sentence at the beginning or end of a stream that was partially cut off.

LeifCarrotson•1h ago
This lacks a whole lot of mobile GPUs. It also does not understand that you can share CPU memory with the GPU, or perform various KV cache offloading strategies to work around memory limits.

It says I have an Arc 750 with 2 GB of shared RAM, because that's the GPU that renders my browser...but I actually have an RTX1000 Ada with 6 GB of GDDR6. It's kind of like an RTX 4050 (not listed in the dropdowns) with lower thermal limits. I also have 64 GB of LPDDR5 main memory.

It works - Qwen3 Coder Next, Devstral Small, Qwen3.5 4B, and others can run locally on my laptop in near real-time. They're not quite as good as the latest models, and I've tried some bigger ones (up to 24GB, it produces tokens about half as fast as I can type...which is disappointingly slow) that are slower but smarter.

But I don't run out of tokens.

Felixbot•1h ago
The RAM/VRAM cutoff matters more than the parameter count alone. A 13B model in Q4_K_M quantization fits in 8GB VRAM with reasonable throughput, but the same model in fp16 needs 26GB. Most calculators treat quantization as a footnote when it is actually the primary variable. The question is not "can I run 13B" but "what quantization level gives acceptable quality at my hardware ceiling".
sshagent•59m ago
I don't see my beloved 5060ti. looks great though
carra•49m ago
Having the rating of how well the model will run for you is cool. I miss to also have some rating of the model capabilities (even if this is tricky). There are way too many to choose. And just looking at the parameter number or the used memory is not always a good indication of actual performance.
jrmg•48m ago
Is there a reliable guide somewhere to setting up local AI for coding (please don’t say ‘just Google it’ - that just results in a morass of AI slop/SEO pages with out of date, non-self-consistent, incorrect or impossible instructions).

I’d like to be able to use a local model (which one?) to power Copilot in vscode, and run coding agent(s) (not general purpose OpenClaw-like agents) on my M2 MacBook. I know it’ll be slow.

I suspect this is actually fairly easy to set up - if you know how.

AstroBen•38m ago
Ollama or LM Studio are very simple to setup.

You're probably not going to get anything working well as an agent on an M2 MacBook, but smaller models do surprisingly well for focused autocomplete. Maybe the Qwen3.5 9B model would run decently on your system?

jrmg•7m ago
Right - setting up LM studio is not hard. But how do I connect LM Studio to Copilot, or set up an agent?
AstroBen•47m ago
This doesn't look accurate to me. I have an RX9070 and I've been messing around with Qwen 3.5 35B-A3B. According to this site I can't even run it, yet I'm getting 32tok/s ^.-
misnome•32m ago
It seems to be missing a whole load of the quantized Qwen models, Qwen3.5:122b works fine in the 96GB GH200 (a machine that is also missing here....)
unfirehose•46m ago
if you do, would you still want to collect data in a single pane of glass? see my open source repo for aggregating harness data from multiple machine learning model harnesses & models into a single place to discover what you are working on & spending time & money. there is plans for a scrobble feature like last.fm but for agent research & code development & execution.

https://github.com/russellballestrini/unfirehose-nextjs-logg...

thanks, I'll check for comments, feel free to fork but if you want to contribute you'll have to find me off of github, I develop privately on my own self hosted gitlab server. good luck & God bless.

varispeed•41m ago
Does it make any sense? I tried few models at 128GB and it's all pretty much rubbish. Yes they do give coherent answers, sometimes they are even correct, but most of the time it is just plain wrong. I find it massive waste of time.
boutell•32m ago
I'm not sure how long ago you tried it, but look at Qwen 3.5 32b on a fast machine. Usually best to shut off thinking if you're not doing tool use.
orthoxerox•38m ago
For some reason it doesn't react to changing the RAM amount in the combo box at the top. If I open this on my Ryzen AI Max 395+ with 32 GB of unified memory, it thinks nothing will fit because I've set it up to reserve 512MB of RAM for the GPU.
bityard•12m ago
Yeah, this site is iffy at best. I didn't even see Strix Halo on the list, but I selected 128GB and bumped up the memory bandwidth. It says gpt-oss-120b "barely runs" at ~2 t/s.

In reality, gpt-oss-120b fits great on the machine with plenty of room to spare and easily runs inference north of 50 t/s depending on context.

kylehotchkiss•34m ago
My Mac mini rocks qwen2.5 14b at a lightning fast 11/tokens a second. Which is actually good enough for the long term data processing I make it spend all day doing. It doesn’t lock up the machine or prevent its primary purpose as webserver from being fulfilled.
freediddy•33m ago
i think the perplexity is more important than tokens per second. tokens per second is relatively useless in my opinion. there is nothing worse than getting bad results returned to you very quickly and confidently.

ive been working with quite a few open weight models for the last year and especially for things like images, models from 6 months would return garbage data quickly, but these days qwen 3.5 is incredible, even the 9b model.

sroussey•28m ago
No, getting bad results slowly is much worse. Bad results quickly and you can make adjustments.

But yes, if there is a choice I want quality over speed. At same quality, I definitely want speed.

meatmanek•32m ago
This seems to be estimating based on memory bandwidth / size of model, which is a really good estimate for dense models, but MoE models like GPT-OSS-20b don't involve the entire model for every token, so they can produce more tokens/second on the same hardware. GPT-OSS-20B has 3.6B active parameters, so it should perform similarly to a 3-4B dense model, while requiring enough VRAM to fit the whole 20B model.

(In terms of intelligence, they tend to score similarly to a dense model that's as big as the geometric mean of the full model size and the active parameters, i.e. for GPT-OSS-20B, it's roughly as smart as a sqrt(20b*3.6b) ≈ 8.5b dense model, but produces tokens 2x faster.)

lambda•13m ago
Yeah, I looked up some models I have actually run locally on my Strix Halo laptop, and its saying I should have much lower performance than I actually have on models I've tested.

For MoE models, it should be using the active parameters in memory bandwidth computation, not the total parameters.

littlestymaar•11m ago
While your remark is valid, there's two small inaccuracies here:

> GPT-OSS-20B has 3.6B active parameters, so it should perform similarly to a 3-4B dense model, while requiring enough VRAM to fit the whole 20B model.

First, the token generation speed is going to be comparable, but not the prefil speed (context processing is going to be much slower on a big MoE than on a small dense model).

Second, without speculative decoding, it is correct to say that a small dense model and a bigger MoE with the same amount of active parameters are going to be roughly as fast. But if you use a small dense model you will see token generation performance improvements with speculative decoding (up to x3 the speed), whereas you probably won't gain much from speculative decoding on a MoE model (because two consecutive tokens won't trigger the same “experts”, so you'd need to load more weight to the compute units, using more bandwidth).

pbronez•7m ago
The docs page addresses this:

> A Mixture of Experts model splits its parameters into groups called "experts." On each token, only a few experts are active — for example, Mixtral 8x7B has 46.7B total parameters but only activates ~12.9B per token. This means you get the quality of a larger model with the speed of a smaller one. The tradeoff: the full model still needs to fit in memory, even though only part of it runs at inference time.

> A dense model activates all its parameters for every token — what you see is what you get. A MoE model has more total parameters but only uses a subset per token. Dense models are simpler and more predictable in terms of memory/speed. MoE models can punch above their weight in quality but need more VRAM than their active parameter count suggests.

https://www.canirun.ai/docs

nilslindemann•28m ago
1. More title attributes please ("S 16 A 7 B 7 C 0 D 4 F 34", huh?)

2. Add a 150% size bonus to your site.

Otherwise, cool site, bookmarked.

amelius•26m ago
Why isn't there some kind of benchmark score in the list?
amelius•25m ago
What is this S/A/B/C/etc. ranking? Is anyone else using it?
vikramkr•24m ago
Just a tier list I think
relaxing•20m ago
Apparently S being a level above A comes from Japanese grading. I’ve been confused by that, too.
swiftcoder•12m ago
It's very common in Japanese-developed video games as well
tcbrah•17m ago
tbh i stopped caring about "can i run X locally" a while ago. for anything where quality matters (scripting, code, complex reasoning) the local models are just not there yet compared to API. where local shines is specific narrow tasks - TTS, embeddings, whisper for STT, stuff like that. trying to run a 70b model at 3 tok/s on your gaming GPU when you could just hit an API for like $0.002/req feels like a weird flex IMO
sdingi•16m ago
When running models on my phone - either through the web browser or via an app - is there any chance it uses the phone's NPU, or will these be GPU only?

I don't really understand how the interface to the NPU chip looks from the perspective of a non-system caller, if it exists at all. This is a Samsung device but I am wondering about the general principle.

amelius•14m ago
It would be great if something like this was built into ollama, so you could easily list available models based on your current hardware setup, from the CLI.
am17an•13m ago
You can still run larger MoE models using expert weight off-loading to the CPU for token generation. They are by and large useable, I get ~50 toks/second on a kimi linear 48B (3B active) model on a potato PC + a 3090
brcmthrowaway•7m ago
If anyone hasn't tried Qwen3.5 on Apple Silicon, I highly suggest you to! Claude level performance on local hardware. If the Qwen team didn't get fired, I would be bullish on Local LLM.