news newest ask show jobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

The 29th International Obfuscated C Code Contest (IOCCC) 2025 Winners

https://www.ioccc.org/2025/

149•matt_d•3h ago•35 comments

I design with Claude more than Figma now

https://blog.janestreet.com/i-design-with-claude-code-more-than-figma-now-index/

154•MrBuddyCasino•4h ago•115 comments

Valve P2P networking broken for more than 2 months

https://github.com/ValveSoftware/GameNetworkingSockets/issues/398

158•babuskov•6h ago•76 comments

Speculative KV coding: losslessly compressing KV cache by up to ~4×

https://fergusfinn.com/blog/kv-entropy-coder/

28•kkm•2d ago•5 comments

Field of clones: How horse replicas came to dominate polo

https://knowablemagazine.org/content/article/technology/2026/cloned-polo-horses

84•gscott•6h ago•41 comments

Win16 Memory Management

http://www.os2museum.com/wp/win16-memory-management/

16•supermatou•1d ago•1 comments

Tokenomics: Quantifying Where Tokens Are Used in Agentic Software Engineering

https://arxiv.org/abs/2601.14470

103•Anon84•8h ago•35 comments

Symbolica 2.0: Programmable Symbols for Python and Rust

https://symbolica.io/posts/symbolica_2_0_release/

83•mmastrac•1d ago•6 comments

My Software North Star

https://kristoff.it/blog/north-star/

66•kristoff_it•3d ago•33 comments

Ntsc-rs – open-source video emulation of analog TV and VHS artifacts

https://ntsc.rs/

345•gregsadetsky•14h ago•92 comments

Public Domain Image Archive

https://pdimagearchive.org/

119•davidbarker•9h ago•17 comments

Harness engineering: Leveraging Codex in an agent-first world

https://openai.com/index/harness-engineering/

195•pramodbiligiri•1d ago•123 comments

How Liminalism Became the Defining Aesthetic of Our Time

https://hyperallergic.com/how-liminalism-became-the-defining-aesthetic-of-our-time/

66•zeech•7h ago•37 comments

Biohub releases a world model of protein biology

https://biohub.org/news/world-model-of-protein-biology/

73•gmays•3d ago•4 comments

Introducing Boron Buckyballs: Theory that B80 cages can’t be made is disproved

https://cen.acs.org/materials/nanomaterials/buckyballs-boron-buckminster-fullerene-nanomaterials/...

84•crescit_eundo•2d ago•21 comments

Moving beyond fork() + exec()

https://lwn.net/SubscriberLink/1076018/16f01bbbb8e0d1f0/

300•jwilk•19h ago•292 comments

Arithmetic Without Numbers – How LLMs Do Math

https://alvaro-videla.com/llm-arithmetic-internals/article_interactive/article.html

22•old_sound•1d ago•7 comments

Show HN: Oproxy – inspect and modify network traffic from the browser

https://github.com/sauravrao637/oproxy

47•sauravrao637•7h ago•6 comments

Human-Like Neural Nets by Catapulting

https://gwern.net/llm-catapult

38•telotortium•9h ago•8 comments

Nvidia is proposing a beast of a CPU system for Windows PCs

https://twitter.com/lemire/status/2062880075117113739

282•tosh•20h ago•474 comments

Meta confirms 1000s of Instagram accounts were hacked by abusing its AI chatbot

https://this.weekinsecurity.com/meta-confirms-thousands-of-instagram-accounts-were-hacked-by-abus...

592•speckx•15h ago•209 comments

Zeroserve: A zero-config web server you can script with eBPF

https://su3.io/posts/introducing-zeroserve

233•losfair•18h ago•55 comments

Google to pay SpaceX $920M a month for compute capacity at xAI data centers

https://www.cnbc.com/2026/06/05/google-to-pay-spacex-920-million-a-month-for-xai-compute-capacity...

243•toephu2•1d ago•824 comments

Show HN: Free animated icon library for Vue

https://respeak-io.github.io/lucide-motion-vue/

29•evolabs•3d ago•6 comments

Sem: New primitive for code understanding – not LSPs, but entities on top of Git

https://ataraxy-labs.github.io/sem/

118•rohanucla•13h ago•47 comments

Motorola effectively bricked its entire line of WiFi routers without explanation

https://mashable.com/tech/motorola-wifi-routers-stop-working-motosync-plus-app-down

153•thisislife2•19h ago•70 comments

Ask HN: What was your "oh shit" moment with GenAI?

619•andrehacker•2d ago•1013 comments

Pokemon Emerald Ported to WebAssembly (100k FPS)

https://pokeemerald.com/

319•tripplyons•22h ago•94 comments

Show HN: Infinite canvas notes in the non-Euclidean Poincaré disk

https://uonr.github.io/poincake/

165•uonr•4d ago•29 comments

Unicode Fonts and Tools for X11

https://www.cl.cam.ac.uk/~mgk25/ucs-fonts.html

38•kristianp•2d ago•8 comments

Open in hackernews

Speculative KV coding: losslessly compressing KV cache by up to ~4×

https://fergusfinn.com/blog/kv-entropy-coder/

28•kkm•2d ago

Comments

hypfer•1h ago

TL;DR (and please correct me if I got it wrong):

Tiny deterministic model predicts the K/V cache, prediction is compared with reality, delta is stored in vram. The other way round then just predicts the values again, applies the delta, and you have the full correct value while just storing the delta

And this works because you're never looking at the whole k/v cache but always just a slice. So you just need a memory buffer of the size of the slice

___

If this works out and I've understood correctly, that _I think_ would mean that a 24GB RTX 4090 could fit 256k q8 context next to Qwen3.6-27B at IQ4_NL.

Or, alternatively, something like 208k context (matching claude api limits of 200k in some plans) with a slightly larger quant like UD-Q4_K_XL.

That would be massive. Especially since the thing has so much compute to spare.

Though, all depending on the size of that predictor model I guess?

porridgeraisin•1h ago

I am yet to do a "deep dive" into the results, but what a well written article. An LLM could _never_ write so crisply.

mirekrusin•1h ago

If “speculative” approach works so well in different contexts why not make it first class and use everywhere, possibly recursively?

0-_-0•43m ago

You can use the original model to compress the kv cache and get ∞x compression, since the prediction is perfect. The cost is time, and I don't see how this could be worth it.

zozbot234•41m ago

The problem with this approach is that even recomputing a "draft" of the KV cache is still quadratic in context length. Maybe you can get some constant savings by always recomputing the earliest tokens, but it's not a good tradeoff as context sizes grow.