frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Efficient AI: KV Caching and KV Sharing

https://blog.gaurav.ai/2025/08/05/kv-caching-kv-sharing/EfficientAITechniques:KVCachingandKVSharing
1•gauravm•6mo ago

Comments

gauravm•6mo ago
New blog post on Efficient AI Techniques: KV Caching and KV Sharing.

Efficient training and inference is table stakes for LLMs these days, and these two algorithmic efficiency techniques work really well for reducing LLM latency as well as memory usage, while retaining the model performance. Feel free to give it a read, and drop a note if I missed something.