frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

KV Cache Is Becoming the Memory Hierarchy of Inference

https://touchdown-labs.com/blog/kv-cache-memory-hierarchy-inference.html
28•matt_d•2d ago

Comments

htk•29m ago
Hard to read article. The writing is curiously more robotic and repetitive than those written by AI.
chuzz•24m ago
i like how even if i can parse most of it it does sound like technically accurate technobabble, could be of inspiration for a tv show :D
tptacek•11m ago
There's like an interesting systems article here, but at this point I'd rather they just gave me the prompt they used to generate it, so I can read it interactively in my own GPT5.5 session.
cyanydeez•7m ago
ok, so for anyone whose not played with local models and watched what's going on with the KV cache:

1. You send your prompt, and now adays, whatever harness you're using sends a whole mess of context: available skills, tools, guardrails, etc. The GPU/inference engine starts processing it into tokens. This is the "Prompt Processing" speed and it's the fastest portion of inference, but is essentially "buffering" (text -> tokens). These tokens can be cached.

2. The inference then generates, more slowly, the next tokens; these I think are cached also (tokens -> text)

Crucially: the KV cache is the _hardware_ cache; it is not a software layer currently, and even if it were, that'd make it extremely slow because it's storing _all_ the tokens in a conversation. So like all cache, cache eviction has to occur to free up the VRAM necessary.

So if you had a conversation an hour ago, in the cloud, it's doubtful any of those tokens still exist so if you got up to 500k, you're going through step #1 again; if you're doing turn by turn immediately, you can skip to #2.

So some of the reports in March about suddenly all the token gen allowance disappearing within hours was likely a KV cache/billing issue: they were charging you as if you were generating all those tokens for every back and forth. Whether it was a bug in billing vs a bug in programming, who knows.

The trouble is that the traditional webserver type of proxy caching & load balancing tricks that helped scale the web don't work here! Your conversation with 100k context has to return to the same cluster, maybe even the same GPU to rely on the extraordinary fast KV cache reuse.

I’ve built a virtual museum with nearly every operating system you can think of

https://virtualosmuseum.org/
332•andreww591•3h ago•68 comments

Apple unveils new accessibility features

https://www.apple.com/newsroom/2026/05/apple-unveils-new-accessibility-features-and-updates-with-...
470•interpol_p•7h ago•247 comments

I’ve joined Anthropic

https://twitter.com/karpathy/status/2056753169888334312
823•dmarcos•4h ago•324 comments

KV Cache Is Becoming the Memory Hierarchy of Inference

https://touchdown-labs.com/blog/kv-cache-memory-hierarchy-inference.html
32•matt_d•2d ago•4 comments

Gaussian Splat of a Strawberry

https://superspl.at/scene/84df8849
407•danybittel•8h ago•156 comments

Gentoo News: Copy Fail, Dirty Frag, and Fragnesia Kernel Vulnerabilities

https://www.gentoo.org/news/2026/05/19/copy-fail-fragnesia-vulnerabilities.html
64•akhuettel•3h ago•17 comments

Gemini 3.5 Flash: frontier intelligence with action

https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-5/
165•meetpateltech•1h ago•85 comments

Show HN: Superlog (YC P26) – Observability that installs itself and fixes bugs

https://superlog.sh/
31•Magnanten•3h ago•27 comments

CISA Admin Leaked AWS GovCloud Keys on GitHub

https://krebsonsecurity.com/2026/05/cisa-admin-leaked-aws-govcloud-keys-on-github/
293•LelouBil•11h ago•133 comments

Intro to TLA+ for the LLM Era: Prompt Your Way to Victory

https://emptysqua.re/blog/intro-to-tla-plus-for-the-llm-era/
71•zdw•2d ago•17 comments

Hanoi’s humble beer glass and the memory of a nation

https://sundaylongread.com/2026/05/15/hanois-humble-beer-glass-and-the-memory-of-a-nation/
87•NaOH•1d ago•15 comments

Gemini Omni

https://deepmind.google/models/gemini-omni/
56•meetpateltech•1h ago•15 comments

I Found Ultra-Pure Quantum Crystals in an Abandoned Mine in the Atacama Desert

https://medium.com/@breid.at/ultra-pure-quantum-crystals-from-an-abandoned-mine-in-a-mysterious-d...
226•vi_sextus_vi•2d ago•92 comments

The last six months in LLMs in five minutes

https://simonwillison.net/2026/May/19/5-minute-llms/
672•yakkomajuri•17h ago•528 comments

Mini Shai-Hulud Strikes Again: 314 npm Packages Compromised

https://safedep.io/mini-shai-hulud-strikes-again-314-npm-packages-compromised/
315•theanonymousone•14h ago•244 comments

KV Sharing, MHC, and Compressed Attention

https://magazine.sebastianraschka.com/p/recent-developments-in-llm-architectures
12•gmays•2h ago•0 comments

Peter Neumann has died

https://www.tuhs.org/pipermail/tuhs/2026-May/033748.html
285•pabs3•15h ago•23 comments

Google Search as you know it is over

https://techcrunch.com/2026/05/19/google-search-as-you-know-it-is-over/
85•evo_9•1h ago•100 comments

Show HN: I made a 3D pose maker for artists

https://setpose.com/
56•augustvdv•5h ago•27 comments

An Apple (II) for Teacher

https://technicshistory.com/2026/05/19/an-apple-ii-for-teacher/
46•cfmcdonald•18h ago•14 comments

Show HN: Haystack – Review the PRs that need human attention

https://haystackeditor.com/
11•akshaysg•1d ago•5 comments

Deciphering the Hashihara Castle Town Map

https://www.obayashi.co.jp/en/kikan_obayashi/detail/kikan_64_project.html
3•1970-01-01•1h ago•0 comments

Google I/O

https://io.google/2026/
153•thanhhaimai•2h ago•195 comments

Polypad

https://polypad.amplify.com/
192•ivank•2d ago•22 comments

OpenBSD 7.9

https://www.openbsd.org/79.html
299•bradley_taunt•5h ago•209 comments

Cursor Introduces Composer 2.5

https://cursor.com/blog/composer-2-5
268•asar•1d ago•196 comments

Kv4p HT – A homebrew 1W radio (VHF or UHF) that plugs into an Android phone

https://www.kv4p.com/
155•krupan•3d ago•65 comments

AI, "Humanity", and Dr. Manhattan Syndrome: A Communications Intervention

https://www.personfamiliar.com/p/ai-humanity-and-dr-manhattan-syndrome
4•stalfosknight•1h ago•0 comments

Click (2016)

https://clickclickclick.click/
358•andrewzeno•20h ago•91 comments

Nim-Presto – REST API Framework for Nim Language (2024)

https://github.com/status-im/nim-presto
55•TheWiggles•2d ago•10 comments