news newest ask show jobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: KVBoost – chunk-level KV cache reuse for HuggingFace, 5–48x faster TTFT

https://pythongiant.github.io/KVBoost/

9•pythongiant•1h ago

Comments

pythongiant•1h ago

KVBoost is a chunk-level KV cache reuse library for HuggingFace models (pip install kvboost). It supports two recompute strategies (selective boundary and CacheBlend), int8/int4 KV quantization for 2–4x RAM reduction, disk-backed cold storage, and 11 architectures including Llama, Qwen, Gemma, Mistral, and Phi. On Qwen2.5-3B we measured 47.9x TTFT speedup on an 8-turn conversation, 21x on code context reuse, 100–743x faster than MLX, and 3–41x faster than vLLM-MLX — including interior chunk reuse where vLLM gets zero hits. Outputs are token-for-token identical to baseline under greedy decoding. Works best on 3B+ models with 500+ token shared context. GitHub: https://github.com/pythongiant/KVBoost

pferdone•26m ago

slop

snovv_crash•26m ago

Even the things that should be normal dashes are em-dashes

arjie•23m ago

I don't get it. The output of the CacheBlend paper is in LMCache. Did you compare against vLLM with LMCache? This is confusing.

hexnuts•37m ago

Bad site design, if I can't scroll to see the next slide, that's just broken.

CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs

https://arxiv.org/abs/2605.19269

30•matt_d•1h ago•1 comments

Project Hail Mary – Stellar Navigation Chart

https://valhovey.github.io/gaia-mary/

794•speleo•14h ago•175 comments

Slumber a TUI HTTP Client

https://slumber.lucaspickering.me

20•jicea•2h ago•1 comments

The surprising story behind the first British person in space

https://www.bbc.com/culture/article/20260518-helen-sharman-the-story-behind-the-first-british-per...

21•xoxxala•1d ago•0 comments

Blog ran on Ubuntu 16.04 for 10 years. I migrated it to FreeBSD

https://crocidb.com/post/this-blog-ran-on-ubuntu-16-04-for-10-years-i-migrated-it-to-freebsd/

242•speckx•11h ago•129 comments

Show HN: KVBoost – chunk-level KV cache reuse for HuggingFace, 5–48x faster TTFT

https://pythongiant.github.io/KVBoost/

11•pythongiant•1h ago•5 comments

Was my $48K GPU server worth it?

https://rosmine.ai/2026/05/13/was-my-48k-gpu-worth-it/

390•apwheele•3d ago•271 comments

Cleve Moler (Matlab, MathWorks) passed away on May 20, 2026

https://www.mathworks.com/company/aboutus/founders/clevemoler.html

33•mychele•3h ago•2 comments

Uv is fantastic, but its package management UX is a mess

https://www.loopwerk.io/articles/2026/uv-ux-mess/

160•nchagnet•9h ago•89 comments

Using Kagi Search with Low Vision

https://veroniiiica.com/using-kagi-search-with-low-vision/

178•speckx•11h ago•52 comments

The memory shortage is causing a repricing of consumer electronics

https://davidoks.blog/p/ai-is-killing-the-cheap-smartphone

129•d0ks•8h ago•127 comments

The death of the brick and mortar toy store

https://brainbaking.com/post/2026/05/the-death-of-the-brick-and-mortar-toy-store/

60•speckx•2d ago•48 comments

Show HN: Freenet, a peer-to-peer platform for decentralized apps

https://freenet.org/

260•sanity•16h ago•149 comments

Indexing a year of video locally on a 2021 MacBook with Gemma4-31B (50GB swap)

https://blog.simbastack.com/indexed-a-year-of-video-locally/

354•asenna•16h ago•103 comments

Mycorrhizal Fungi, Nature's Key to Plant Survival and Success

https://pacifichorticulture.org/articles/mycorrhizal-fungi-natures-key-to-plant-survival-and-succ...

75•mooreds•1d ago•11 comments

Tristan Davey's Punch Card Archive

https://punchcards.tristandavey.com/

23•ohjeez•2d ago•3 comments

The Hardware Lottery

https://hardwarelottery.github.io/

6•intelkishan•1d ago•0 comments

Python 3.15: features that didn't make the headlines

https://blog.changs.co.uk/python-315-features-that-didnt-make-the-headlines.html

361•rbanffy•19h ago•173 comments

Lost Images from the 1945 Trinity Nuclear Test Restored

https://spectrum.ieee.org/trinity-nuclear-test

329•pseudolus•19h ago•98 comments

Flipper One – we need your help

https://blog.flipper.net/flipper-one-we-need-your-help/

1115•sandebert•19h ago•436 comments

Launch HN: Runtime (YC P26) – Sandboxed coding agents for everyone on a team

https://www.runtm.com/

82•gustrigos•14h ago•22 comments

Spotify will start reserving concert tickets for fans

https://www.hollywoodreporter.com/music/music-industry-news/spotify-will-start-reserving-concert-...

133•elffjs•14h ago•271 comments

Waymo pauses Atlanta service as its robotaxis keep driving into floods

https://techcrunch.com/2026/05/21/waymo-pauses-atlanta-service-as-its-robotaxis-keep-driving-into...

296•mattas•14h ago•368 comments

Multi-Stream LLMs: new paper on parallelizing/separating prompts, thinking, I/O

https://arxiv.org/abs/2605.12460

93•atomicthumbs•10h ago•10 comments

Deciphering the Hashihara Castle Town Map

https://www.obayashi.co.jp/en/kikan_obayashi/detail/kikan_64_project.html

37•1970-01-01•2d ago•0 comments

Seattle Shield, an intelligence-sharing network operated by the Seattle police

https://prismreports.org/2026/05/20/seattle-shield-private-companies-surveillance/

450•root-parent•12h ago•178 comments

Google's Antigravity bait and switch

https://www.0xsid.com/blog/antigravity-bait-n-switch

635•ssiddharth•16h ago•287 comments

Throwing AI-generated walls of text into conversations

https://noslopgrenade.com/

565•napolux•21h ago•341 comments

News outlets are limiting the Internet Archive’s access to their journalism

https://www.niemanlab.org/2026/05/more-than-340-local-news-outlets-are-limiting-the-internet-arch...

258•jaredwiener•13h ago•89 comments

We're testing new ad formats in Search and expanding our Direct Offers pilot

https://blog.google/products/ads-commerce/google-marketing-live-search-ads/

591•sofumel•20h ago•529 comments