frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

DeepSeek 4 Flash local inference engine for Metal

https://github.com/antirez/ds4
85•tamnd•2h ago

Comments

maherbeg•2h ago
This is so sick. I'm really curious to see what focused effort on optimizing a single open source model can look like over many months. Not only on the inference serving side, but also on the harness optimization side and building custom workflows to narrow the gap between things frontier models can infer and deduce and what open source models natively lack due to size, training etc.
dakolli•1h ago
There will always be a huge gap between frontier models and open source models (unless you're very rich). This whole industry makes no sense, everyone is ignoring the unit economics. It cost 20k a month to running Kimi 2.6 at decent tok/ps, to sell those tokens at a profit you'd need your hardware costs to be less 1k a month.

Everyone who's betting their competency on the generosity of billionaires selling tokens for 1/10-1/20th of the cost, or a delusional future where capable OS models fit on consumer grade hardware are actually cooked.

bensyverson•59m ago
If you looked at a graph of GPU power in consumer hardware and model capability per billion parameters over time, it seems inevitable that in the next few years a "good enough" model will run on entry-level hardware.

Of course there will always be larger flagship models, but if you can count on decent on-device inference, it materially changes what you can build.

physicsguy•57m ago
It also massively changes the value economics of the frontier models. In a lot of cases, you really don't need a general purpose intelligence model too.
bensyverson•9m ago
Exactly… as hn readers, we sometimes forget that a lot of people are using these tools to search for the best sunscreen, or rewrite an email.
dakolli•50m ago
No offense, this is a crazy delusional statement.
afro88•45m ago
No offense, this is a crazy worthless contribution to the discussion.

Why?

otabdeveloper4•57m ago
> a delusional future where capable OS models fit on consumer grade hardware

48 gb is enough for a capable LLM.

Doing that on consumer grade hardware is entirely possible. The bottleneck is CUDA and other intellectual property moats.

liuliu•56m ago
I am not sure where this comment is from (possibly without looking at this project?). This project is running quasi-frontier model at reasonable tps (~30) with reasonable prefill performance (~500tps) with a high-end laptop. People simply project what they see from this project to what you optimistically can expect.

You can argue whether the projection is too optimistic or not, but this project definitely made me a little bit optimistic on that end.

amunozo•38m ago
Most tasks do not require frontier models, so as long as these models cover 95-99 per cent of the tasks, closed frontier models can be left for niche and specialized cases that are harder.
amunozo•40m ago
I am curious about it producing less tokens except for the max mode. I love DeepSeek V4 Flash and I use it extensively, it's so cheap I can use it all day and still not use all my 10$ OpenCode Go subscription. I use it always in max mode because of this, but now I wonder whether I should rather use high.
unshavedyak•21m ago
What do you use it for? I tend to just stick to SOTA (Claude 4.7 Max thinking), and put up with the slow req/response. I'm not sure what type of work i'd trust a less thinking model, as my intuition is built around what Claude vSOTA Max can handle.

Nonetheless eventually i want to build an at-home system. I imagine some smaller local model could handle metadata assignment quite well.

edit: Though TIL Mac Studio doesn't offer 512GB anymore... DRAM shortage lol. Rough.

syntaxing•13m ago
How has opencode go been for you? Worth changing over from Claude pro?
antirez•33m ago
A random, funny, interesting and telling data point: my MacBook M3 Max while DS4 is generating tokens at full speed peaks 50W of energy usage...
bertili•27m ago
equals 2 or 3 human brains in power usage. Amazing work!
antirez•20m ago
True quantitatively, not qualitatively. DeepSeek V4 is not capable of doing what a human brain can do, of course, but for the tasks it can do, it can do it at a speed which is completely impossible for a human, so comparing the two requires some normalization for speed.
minimaxir•27m ago
"Data centers for LLMs are technically more energy efficient per-user than self-hosting LLM models due to economies-of-scale" is a data point the internet isn't ready for.
Onavo•21m ago
There's a bunch of companies doing garage GPU datacenters now. Probably can act as a heat source during winter too if you have a heat pump.
Hamuko•24m ago
I think I’ve seen about 60 watt total system whenever I’ve used a local model on a MacBook Pro or a Mac Studio. Baseline for the Mac Studio is like 10 W and like 6 W for the MacBook Pro.
happyPersonR•23m ago
So just gonna ask a question, probably will get downvoted

I know this is flash, but….

But other than this guy, did our whole society seriously never flamegraph this stuff before we started requesting nuclear reactors colocated at data centers and like more than 10% of gdp?

Someone needs to answer because this isn’t even a m4 or m5… WHAT THE FUCK

Sidenote: shout out antirez love my redis :)

liuliu•22m ago
DSv4 generates much faster on NVIDIA class hardware. It is just a very efficient model.
AlotOfReading•15m ago
This is built atop a tower of stuff people built with profiling and performance-oriented design.

That said, I've found that most corporate environments are unintentionally hostile to this kind of optimization work. It's hard to justify until the work is already done. That means you often need people with the skills, means, and motivation to do this that are outside normal corporate constraints. There aren't many of those.

happyPersonR•4m ago
Building this into agentic dev workflows (subject to token/time constraints) is something I spent a lot of time doing at work. I actually am kind of proud of that hahah

But you’re right I agree

In the corporate world they sadly don’t take kindly to performance profiling as a first class citizen

Granted I will say optimization without requirements may not be beneficial but at least profiling itself seems worthy if you have use cases.

A lot of us have been working in the network packet pusher software , distributed systems , distributed storage space

I’m happy to see more stuff like this :)

TLDR; I’ve not seen a lot of flamegraphs of Llm end to end … idk if anyone else has?

fgfarben•9m ago
The world is not China.
wmf•3m ago
Every lab has a bunch of people doing nothing but optimizing.
nazgulsenpai•7m ago
I keep seeing DS4 and in order my brain interprets it as Dark Souls 4 (sadface), DualShock 4, Deep Seek 4.

The map that keeps Burning Man honest

https://www.not-ship.com/burning-man-moop/
356•speckx•4h ago•151 comments

AlphaEvolve: Gemini-powered coding agent scaling impact across fields

https://deepmind.google/blog/alphaevolve-impact/
153•berlianta•3h ago•46 comments

Agents need control flow, not more prompts

https://bsuh.bearblog.dev/agents-need-control-flow/
63•bsuh•1h ago•30 comments

DeepSeek 4 Flash local inference engine for Metal

https://github.com/antirez/ds4
85•tamnd•2h ago•29 comments

Child marriages plunged when girls stayed in school in Nigeria

https://www.nature.com/articles/d41586-026-00796-2
235•surprisetalk•4h ago•153 comments

Chrome removes claim of On-device Al not sending data to Google Servers

https://old.reddit.com/r/chrome/comments/1t5qayz/chrome_removes_claim_of_ondevice_al_not_sending/
170•newsoftheday•2h ago•42 comments

Natural Language Autoencoders: Turning Claude's Thoughts into Text

https://www.anthropic.com/research/natural-language-autoencoders
9•instagraham•32m ago•0 comments

I switched from Mac to a Lenovo Chromebook

https://blog.johnozbay.com/i-left-apples-ecosystem-for-a-lenovo-chromebook-and-you-can-too.html
59•speckx•2h ago•80 comments

PySimpleGUI 6

https://github.com/PySimpleGUI/PySimpleGUI
39•geophph•2d ago•13 comments

The Self-Cancelling Subscription

https://predr.ag/blog/the-self-cancelling-subscription/
95•surprisetalk•4h ago•39 comments

OpenBSD Stories: The closest thing to cute kittens (OpenBSD/zaurus)

http://miod.online.fr/software/openbsd/stories/zaurus1.html
29•zdw•23h ago•4 comments

RaTeX: KaTeX-compatible LaTeX rendering engine in pure Rust

https://ratex.lites.dev/
120•atilimcetin•3d ago•73 comments

MPEG-2 Transport Stream Packaging for Media over QUIC Transport

https://www.ietf.org/archive/id/draft-gregoire-moq-msfts-00.html
35•mondainx•3h ago•11 comments

Motherboard sales 'collapse' amid unprecedented shortages fueled by AI

https://www.tomshardware.com/pc-components/motherboards/motherboard-sales-collapse-by-more-than-2...
138•speckx•3h ago•120 comments

SQLite Is a Library of Congress Recommended Storage Format

https://sqlite.org/locrsf.html
546•whatisabcdefgh•20h ago•165 comments

I want to live like Costco people

https://tastecooking.com/i-want-to-live-like-costco-people/
49•speckx•3h ago•106 comments

Appearing productive in the workplace

https://nooneshappy.com/article/appearing-productive-in-the-workplace/
1491•diebillionaires•1d ago•601 comments

How Cloudflare responded to the “Copy Fail” Linux vulnerability

https://blog.cloudflare.com/copy-fail-linux-vulnerability-mitigation/
52•mobeigi•5h ago•53 comments

GovernGPT (YC W24) Is Hiring Engineers to Build Thinking Systems in Montreal

https://www.ycombinator.com/companies/governgpt/jobs/hRyltS0-backend-engineer-thinking-systems
1•owalerys•6h ago

Speedup in Lattice Boltzmann Cylinder Flow

https://github.com/alikamp/Parks-KPBM-Scaling
39•kauai1•3d ago•3 comments

Boris Cherny: TI-83 Plus Basic Programming Tutorial (2004)

https://www.ticalc.org/programming/columns/83plus-bas/cherny/
152•suoken•2d ago•67 comments

Printing Blogs

https://fi-le.net/print/
11•fi-le•1d ago•1 comments

Indian matchbox labels as a visual archive

https://www.itsnicethat.com/features/the-view-from-mumbai-matchbook-graphic-design-130426
129•sahar_builds•3d ago•31 comments

Show HN: Stage CLI – an easier way of reading your AI generated changes locally

https://github.com/ReviewStage/stage-cli
8•cpan22•2h ago•2 comments

Agent-harness-kit scaffolding for multi-agent workflows (MCP, provider-agnostic)

https://ahk.cardor.dev
61•enmanuelmag•7h ago•18 comments

Brazil's Pix Payment System Faces Pressure from Visa and Mastercard

https://www.elciudadano.com/en/brazils-pix-payment-system-faces-pressure-from-visa-and-mastercard...
12•wslh•43m ago•1 comments

Diskless Linux boot using ZFS, iSCSI and PXE

https://aniket.foo/posts/20260505-netboot/
174•stereo-highway•15h ago•91 comments

RSS feeds send me more traffic than Google

https://shkspr.mobi/blog/2026/05/rss-feeds-send-me-more-traffic-than-google/
238•SpyCoder77•17h ago•53 comments

Valve releases Steam Controller CAD files under Creative Commons license

https://www.digitalfoundry.net/news/2026/05/valve-releases-steam-controller-cad-files-under-creat...
1665•haunter•1d ago•564 comments

Vibe coding and agentic engineering are getting closer than I'd like

https://simonwillison.net/2026/May/6/vibe-coding-and-agentic-engineering/
729•e12e•1d ago•823 comments