news newest ask show jobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Lossless LLM 3x Throughput Increase by LMCache

https://github.com/LMCache/LMCache

2•lihanc111•4h ago

Comments

lihanc111•4h ago

Our team has built this open source project, LMCache, to reduce repetitive computation in LLM inference and make systems serve more people (3x more throughput in chat applications) and it has been used in IBM's open source LLM inference stack.

In LLM serving, the input is computed into intermediate states called KV cache to further provide answers. These data are relatively large (~1-2GB for long context) and are often evicted when GPU memory is not enough. In these cases, when users ask a follow up question, the software needs to recompute for the same KV Cache. LMCache is designed to combat that by efficiently offloading and loading these KV cache to and from DRAM and disk.

Ask us anything!

0xjunhao•4h ago

Hi, I had a quick question. Would it be correct to say the following?

1. For long inputs and short outputs, the inference can be arbitrarily number of times faster, as it avoids repeated KV computation.

2. Conversely, for short inputs and long outputs, it might be slightly slower, since loading and storing the KV cache are on the critical path of the execution.

The Discovery Engine (automated system for scientific discovery)

https://zenodo.org/records/15693353

1•talos•1m ago•0 comments

Show HN: Vybetr – Hire AI app developers using tools like Lovable, Bolt and more

https://vybetr.com

3•zicxor•2m ago•0 comments

Using Lxcfs Together with Podman

https://www.die-welt.net/2025/06/using-lxcfs-together-with-podman/

1•todsacerdoti•3m ago•0 comments

Lessons from LangChain and Slack and MCP Integration

https://medium.com/@valliappanr/what-i-learned-integrating-langchain-with-slack-via-mcp-and-why-ai-code-isnt-enough-3e72248b96b1

1•valliappanr•5m ago•1 comments

Use of ch unit considered inappropriate (in certain circumstances)

https://clagnut.com/blog/2432

1•mikehall314•7m ago•0 comments

Brit Watchdog Cracks Down on Data Collection by Smart TVs, Speakers, Air Fryers

https://www.theguardian.com/technology/2025/jun/16/air-fryers-smart-tv-speakers-user-data-privacy-ico

2•m463•7m ago•0 comments

Thoughts on the AI 2027 Discourse

https://dynomight.substack.com/p/ai2027

2•paulpauper•8m ago•0 comments

Childhood and Education #10: Behaviors

https://thezvi.substack.com/p/childhood-and-education-10-behaviors

1•paulpauper•9m ago•0 comments

When Can I Stop Listening to My Enemy's Points?

https://substack.com/home/post/p-166684398

1•paulpauper•12m ago•0 comments

Show HN: Letter Lockbox – A word game I built over the weekend with Claude Code

https://www.letterlockbox.com

1•christensen143•12m ago•0 comments

Programmers and Their Monospace Blogs

https://lambdaland.org/posts/2025-06-24_reading_blogs/

1•ashton314•12m ago•0 comments

Ask HN: What's your fastest conversion from cold outreach to prepaid client?

1•iamarsibragimov•12m ago•0 comments

Namespaced Pundit Policies Without the Repetition Racket

https://alec-c4.com/posts/2025-06-24-pundit-namespaced-policies/

2•alec-c4•15m ago•1 comments

The Legacy of "The Gastronomical Me"

https://lithub.com/fidelity-to-both-pleasure-and-humiliation-on-m-f-k-fishers-feminist-realism/

2•spewil•16m ago•0 comments

Show HN: How Usage Works

https://www.usage.ai/blog/how-usage-works

4•kavehkhorram•17m ago•0 comments

Why Your Car's Touchscreen Is More Dangerous Than Your Phone

https://www.carsandhorsepower.com/featured/your-fancy-car-s-touchscreen-is-worse-than-buttons-and-studies-prove-it

2•m463•17m ago•0 comments

Dr. Dobb's

https://drdobbs.com/

2•johnnyApplePRNG•18m ago•0 comments

Joining CNCF as Executive Director: Let's Build What's Next

https://www.cncf.io/blog/2025/06/24/joining-cncf-as-executive-director-lets-build-whats-next/

3•bretpiatt•19m ago•0 comments

Elisa: A Comprehensive Guide to Enzyme-Linked Immunosorbent Assay

https://www.clyte.tech/post/mastering-elisa-a-comprehensive-guide-to-enzyme-linked-immunosorbent-assay

2•mw2taba88•24m ago•1 comments

Secure your Express application APIs in 5 minutes with Cedar

https://aws.amazon.com/blogs/opensource/secure-your-application-apis-in-5-minutes-with-cedar/

1•idm_guru•26m ago•0 comments

Why Paris's Centre Pompidou, not even 50 years old, must close for five years

https://www.lemonde.fr/en/opinion/article/2025/06/19/why-the-centre-pompidou-not-even-50-years-old-must-close-for-five-years_6742490_23.html

1•PaulHoule•28m ago•1 comments

Curated realities: An AI film festival and the future of human expression

https://arstechnica.com/culture/2025/06/curated-realities-an-ai-film-festival-and-the-future-of-human-expression/

2•rntn•29m ago•0 comments

Scientists can now target the cells at the center of ALS

https://alleninstitute.org/news/scientists-can-now-target-the-cells-at-the-center-of-als/

1•gmays•30m ago•0 comments

Haflang: Hardware Acceleration of Functional Languages

https://haflang.github.io/

1•fanf2•34m ago•0 comments

Waldo – Geoip Lookups

https://geoip.dpdns.org/

1•metalshanked•36m ago•0 comments

David Friedberg: it is important for America that Mamdani get elected

https://twitter.com/friedberg/status/1937593902456099315

1•donsupreme•40m ago•3 comments

Portable Network Graphics (PNG) Specification (Third Edition)

https://www.w3.org/TR/png-3/

1•trothamel•42m ago•0 comments

EU lawmakers vote to bar carry-on luggage fees on planes

https://www.france24.com/en/live-news/20250624-eu-lawmakers-vote-to-bar-carry-on-luggage-fees-on-planes

3•gnabgib•44m ago•1 comments

I Designed UX for an AI Product Last Year. Are Those Lessons Still Valid?

https://uxdesign.cc/ai-ux-design-for-intelligent-interfaces-bc966e96107d

1•antarabasu•45m ago•1 comments

The Sun is twisting Mercury's crust in unexpected ways

https://bgr.com/science/the-sun-is-twisting-mercurys-crust-in-unexpected-ways/

2•Bluestein•46m ago•0 comments