frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Reuse non-prefix KV Cache and speed up RAG by 3X with LMCache

https://github.com/LMCache/LMCache-Examples/blob/main/demo-rag-blending/README.md
5•lihanc111•7h ago

Comments

lihanc111•7h ago
Hey HN Community!

A while back, we shared our open-source project LMCache here and were blown away by the incredible support and feedback. Today, our team is thrilled to share more about one of our core components: CacheBlend. Recognized with a Best Paper Award at ACM EuroSys 2025, this technique is a pain killer for efficient RAG applications The Problem: Your KV Cache is Wasting Potential In modern LLM applications like RAG and Agents, we constantly feed the model new context. For example, in RAG, we retrieve relevant documents and stuff them into the prompt.

The issue is that this dynamically retrieved context doesn't always appear at the beginning of the input sequence. Traditional KV caching only reuses a "common prefix," so if the new information isn't at the very start, the cache hit rate plummets, and your GPU ends up recomputing the same things over and over. The Solution: CacheBlend - 100% Hit Rate, No Compromises CacheBlend changes the game by allowing for the reuse of pre-computed KV caches regardless of their position in the input sequence.

This means we can finally achieve a 100% KV Cache hit rate in applications like RAG. The performance gains are significant:

Faster Time-To-First-Token (TTFT): Get your initial response much quicker.

More Throughput: Serve significantly more users with the same hardware.

Almost lossless Output Quality: All of this is achieved with little degradation in the model's generation quality.

How does it work? CacheBlend intelligently handles the two main challenges of reusing non-prefix caches: Positional Encoding Update: It efficiently updates positional encodings to ensure the model always knows the correct position of each token, even when we're stitching together cached and new data.

Selective Attention Recalculation: Instead of recomputing everything, it strategically recalculates only the minimal cross-attention needed between the new and cached chunks to maintain perfect generation quality.

For detailed analysis, please refer to the official paper: https://dl.acm.org/doi/10.1145/3689031.3696098

Where can I try it? Our official repo is at: https://github.com/LMCache/LMCache The newest interactive CacheBlend demo at: https://github.com/LMCache/LMCache-Examples/tree/main/demo-r...

Ask us anything!

Our Missing Pieces

https://docs.google.com/document/d/1-KSIE89xHnipRBm8T6BRbxEQb5_byr5CwkB-S7XIwjQ/edit?tab=t.0
1•jger15•12s ago•0 comments

Claude Code Down – Auth Issues

https://github.com/anthropics/claude-code/issues/1484
1•rakken•2m ago•0 comments

CatchIdeas – Find High-Traffic Keywords for Product and Content Ideas

https://catchideas.com
1•labubulive•2m ago•0 comments

Fact Sheet: Autism Prevalence

https://www.thetransmitter.org/spectrum/prevalence-autism-u-s-remains-steady-new-data-suggest/
1•domofutu•4m ago•0 comments

No Tax on Overtime Calculator

https://notaxonovertimecalculators.org/
1•dond1986•4m ago•0 comments

V0 Platform API now in beta

https://vercel.com/changelog/v0-platform-api-now-in-beta
1•tzury•4m ago•0 comments

Research suggests electricity markets are using suboptimal pricing

https://arxiv.org/abs/2507.06035
1•cfata•5m ago•1 comments

Thoughts on Motivation and My 40-Year Career

https://charity.wtf/2025/07/09/thoughts-on-motivation-and-my-40-year-career/
1•zdw•6m ago•0 comments

Learning in living mice defies classic synaptic plasticity rule

https://www.thetransmitter.org/learning/learning-in-living-mice-defies-classic-synaptic-plasticity-rule/
1•domofutu•7m ago•0 comments

Doctest is a new C++ testing framework

https://github.com/doctest/doctest
1•BiraIgnacio•10m ago•0 comments

Most people who buy your game won't play it

https://howtomarketagame.com/2025/06/03/most-people-who-buy-your-game-wont-play-it/
1•walterbell•16m ago•0 comments

The #1 Reason Your GenAI Project Will Fail in Production

https://www.mlwhiz.com/p/from-prototype-to-production-mlops
1•ai_unwrapped•21m ago•0 comments

Andreessen Horowitz Leaves Delaware for Nevada, Tells Startups to Follow

https://www.bloomberg.com/news/articles/2025-07-09/andreessen-horowitz-leaves-delaware-for-nevada-tells-startups-to-follow
3•pilingual•22m ago•0 comments

Concorde – The 24 Hour World (1973) [video]

https://archive.org/details/concorde-the-24-hour-world
1•petethomas•25m ago•0 comments

Bug report forms powered by AI – No more duplicates, spam or lackluster reports

https://bugspot.dev
1•PaulPlay•29m ago•1 comments

A warning to sword-makers, and sword buyers

https://www.youtube.com/watch?v=nLIcohyT5Dc
1•duxup•30m ago•0 comments

Firnas: AI Native Travel for Business

https://www.firnas.ai/
1•b0xtch•32m ago•0 comments

Nvidia Became the First $4T Company

https://www.wsj.com/tech/ai/nvidia-nvda-4-trillion-market-cap-466c1c9c
3•ViktorRay•33m ago•0 comments

PoPo: MMD Anime Char Model Pose Generation Using Fine Tuned LLM

https://popo.love
2•Amyang•33m ago•0 comments

Army tests robotic coyotes to defend fighter jets from wildlife

https://www.armytimes.com/news/your-army/2025/07/07/army-tests-robotic-coyotes-to-defend-fighter-jets/
2•bookofjoe•35m ago•0 comments

Music for Heathrow

https://mediacentre.heathrow.com/pressrelease/detail/23253
3•dijksterhuis•39m ago•1 comments

AI Can't Take over Soon Enough for Me

https://rodyne.com/?p=2911
3•boznz•42m ago•0 comments

Using Protobuf to make Jira Cloud faster

https://www.atlassian.com/blog/atlassian-engineering/using-protobuf-to-make-jira-cloud-faster
1•ksec•45m ago•0 comments

Dépanneurs

https://walkmontreal.com/curiosities/depanneurs/
2•thomassmith65•47m ago•0 comments

The first time I was visited by the FBI [video]

https://www.youtube.com/watch?v=Lc2hB2AwHso
4•blakeashleyjr•50m ago•0 comments

I built a Reddit lead gen tool that gives you usernames in 30 seconds

https://www.linkeddit.com/
1•OmPatel5•50m ago•1 comments

Cloudflare forwarding changes causing authenticated emails to be rejected

https://community.cloudflare.com/t/new-mail-auth-requirements-making-an-email-that-passed-dkim-to-not-forward/814140/1
3•sjwj•54m ago•0 comments

Show HN: A generative audio VST plugin using Gemini API, JUCE, and React

3•DesaiAshu•55m ago•1 comments

MicroHs, a tiny Haskell Compiler [video]

https://www.youtube.com/watch?v=SJwvPEq4Mok
2•todsacerdoti•56m ago•0 comments

Apple COO Jeff Williams stepping down later this month

https://9to5mac.com/2025/07/08/apple-coo-jeff-williams-stepping-down-later-this-month/
3•swat535•1h ago•0 comments