frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

CRISPR tech selectively shreds cancer cells, including "undruggable" cancers

https://innovativegenomics.org/news/crispr-technique-selectively-shreds-cancer-cells/
619•gmays•7h ago•164 comments

Swift at Apple: Migrating the TrueType hinting interpreter

https://www.swift.org/blog/migrating-truetype-hinting-to-swift/
87•DASD•3h ago•35 comments

Renault: Electric motors with no rare earths

https://www.renaultgroup.com/en/magazine/energy-and-powertrains/all-about-electric-motors-with-no...
11•bestouff•52m ago•0 comments

How to setup a local coding agent on macOS

https://ikyle.me/blog/2026/how-to-setup-a-local-coding-agent-on-macos
199•kkm•5h ago•64 comments

Malware developers added nuclear and biological weapons text to to their spyware

https://twitter.com/jsrailton/status/2064661778978533571
250•marc__1•1d ago•171 comments

Pirates, a naval warfare game inspired by Sid Meier's Pirates

https://piwodlaiwo.github.io/pirates/
162•iweczek•5h ago•69 comments

Congress Just Rushed Through a Disastrous Copyright Office Overhaul

https://www.eff.org/deeplinks/2026/06/congress-just-rushed-through-disastrous-copyright-office-ov...
63•Cider9986•1d ago•4 comments

Palantir loses legal challenge against Swiss investigative magazine

https://www.ft.com/content/7ffcace7-9dc0-4e7e-9912-895ac073f979
112•sschueller•2h ago•20 comments

Twenty One Zero-Days in FFmpeg

https://depthfirst.com/research/21-zero-days-in-ffmpeg
10•redbell•47m ago•3 comments

Launch HN: BitBoard (YC P25) – Analytics Workspace for Agents

https://bitboard.work/
30•arcb•6h ago•17 comments

Can I Buy Your KV Cache?

https://arxiv.org/abs/2606.13361
25•MediaSquirrel•2h ago•20 comments

Slightly reducing the sloppiness of AI generated front end

https://envs.net/~volpe/blog/posts/reduce-slop.html
154•FergusArgyll•8h ago•107 comments

"Don't You Just Upload It to ChatGPT?"

https://correresmidestino.com/dont-you-just-upload-it-to-chatgpt/
244•speckx•5h ago•210 comments

Mmorpg World of ClaudeCraft, vibe coded with Fable 5

https://worldofclaudecraft.com/
68•beatthatflight•2h ago•61 comments

Introduction to UEFI HTTP(s) Boot with QEMU/OVMF

https://blog.yadutaf.fr/2026/06/12/introduction-to-uefi-https-boot-qemu-ovmf/
66•jtlebigot•8h ago•24 comments

EV demand up 50% in France and Germany since Iran war

https://www.reuters.com/business/renault-electric-vehicle-orders-have-surged-since-start-iran-war...
94•a_paddy•4h ago•41 comments

A PDF that changes based on how its read

https://sgaud.com/texts/pdf
111•SarthakGaud•6h ago•57 comments

Where Did Earth Get Its Oceans? Maybe It Made Them Itself

https://www.quantamagazine.org/where-did-earth-get-its-oceans-maybe-it-made-them-itself-20260612/
93•ibobev•7h ago•56 comments

Cosmodial Sky Atlas

https://killedbyapixel.github.io/Cosmodial/
23•memalign•4h ago•4 comments

Show HN: Turn your name into a tree in an infinite procedural shanshui landscape

https://landscape.bairui.dev/
5•subairui•2d ago•2 comments

If you are asking for human attention, demonstrate human effort

https://tombedor.dev/human-attention-and-human-effort/
1474•jjfoooo4•23h ago•455 comments

Maxproof

https://arxiv.org/abs/2606.13473
123•ilreb•11h ago•10 comments

There Is Life Before Main in Rust

https://grack.com/blog/2026/06/11/life-before-main/
64•mmastrac•1d ago•17 comments

Hazel (YC W24) Is Hiring a Full Stack Engineer

https://www.ycombinator.com/companies/hazel-2/jobs/3epPWgu-full-stack-engineer-ts-sci
1•augustschen•9h ago

I Am Not a Reverse Centaur

https://blog.miguelgrinberg.com/post/i-am-not-a-reverse-centaur
234•ibobev•5h ago•168 comments

Most Beautiful Will Ever Made (1936)

https://paperspast.natlib.govt.nz/newspapers/DOM19360307.2.43
26•cf100clunk•4h ago•12 comments

/architect: Reduce Fable tokens by 80%, Fable orchestrates/reviews, Codex builds

https://github.com/DanMcInerney/architect-loop
5•DanMcInerney•2h ago•0 comments

A dumpster arrived behind my university's library

https://yalereview.org/article/sheila-liming-the-end-of-books
146•mooreds•8h ago•135 comments

Show HN: StackScope – I crawled over 40k indie launches to see what they ship

https://stackscope.dev/
39•datafreak_•7h ago•12 comments

WASI 0.3

https://bytecodealliance.org/articles/WASI-0.3
223•mavdol04•9h ago•86 comments
Open in hackernews

Can I Buy Your KV Cache?

https://arxiv.org/abs/2606.13361
25•MediaSquirrel•2h ago

Comments

root-parent•2h ago
Lambda computing for prompts?
sghiassy•1h ago
A truly global singleton
lumost•1h ago
The KV cache is order dependent and dependent on the context of tokens which exist before the KV cache.

There are some transformation approaches to re-use the kv cache across inferences, but none are in wide use due to accuracy concerns following the transformation.

dgellow•1h ago
Just curious, do you have links to read more about transformations or other techniques for KV cache reuse?
evrydayhustling•1h ago
All major model providers offer prefix caching, which is this.
lumost•54m ago
No, reusing segments of the kv cache for different purposes in an order independent manner is an active research area.
dgellow•44m ago
Any keyword or paper I can search for?
dvmazur•16m ago
AsyncResoning[1] does a trick of that sort to give agents concurrent cache views.

You basically have two agents look at the same cache under different views. Say agent_0 gets [a_1, a_0] and agent_1 gets [a_0, a_1]. They also write to this cache concurrently while decoding. To solve positional embedding inconsistencies they rotate the query projections for each block (a_0 and a_1) separately.

The computations you get that way do not exactly match the setup where you would naively prefill on every step, but are close enough.

Same trick could be used for the setup discussed here, I guess: prefill the document cache separately (p), prepend the system prompt (s) and get a cache view [s, p] from which you can then decode.

1. https://arxiv.org/abs/2512.10931

Eridrus•1h ago
The paper has a section on "Reusing precomputed KV across queries" which talks about how other papers have tried to address this problem, but yeah, this paper adds nothing on its own besides a catchy title.
TZubiri•13m ago
Absolute slop paper. Replace document with text and you'll get it.

"People are asking the same questions and an answer is generated every time, what if we could like cache the questions and their answers..."

Sounds like someone was using chatgpt to understand how chatgpt works and then asked it to generate a paper based on his proposal to improve it.

tonetegeatinst•1h ago
Does anyone have a good recommendation for explaining or as a primer on KV cache?
plutomeetsyou•59m ago
convert this question to KV cache and give it to your agent
wren6991•10m ago
If you want the actual maths instead of handwaving, I recommend: https://arxiv.org/abs/2207.09238

For something gentler, 3Blue1Brown: https://www.youtube.com/watch?v=eMlx5fFNoYc (this is part of a series)

mistercow•1h ago
> Then the part that matters: where the KV lives

When your abstract was clearly generated by an LLM and not curated to at least make it sound human, it does not make me want to read your paper.

numeri•13m ago
especially because this is the most painfully glaring flaw in their plan. Their solution is for an inference provider to... store the KV cache (which they can compute!) on-premise, on their own disks, but pay some third party for it?
TuringNYC•34m ago
Seems Cloudflare is now doing this for scraping, so makes sense to continue down the pipeline!
refulgentis•33m ago
This paper doesn't make any sense - for background, I've maintained an AI client that's cross-platform, cross-provider, and integrates llama.cpp since 2022. I don't know why they think "agents" don't share prefill work - paid providers cache on the prefill text, llama.cpp, same, and I specifically hooked up llama.cpp so it can do subsets as well. i.e. all the agents would reuse the cache

It reads like it started from an underspecification of "agents" x a strain of pop-wisdom about "KV cache" that I've seen enter mainstream discourse over the past 3 months that is Not Even Wrong, then, they solved a non-existent problem.

EDIT: based on the rest of comments either requesting a primer on terms, or, pointing out it makes errors in even more obvious ways, flagging.

christianqchung•25m ago
I don't think Luoyuan Zhang is necessarily doing this, but I'm pretty sure lots of people are using arxiv as a glorified blog and hoping no one notices.
wren6991•19m ago
Prefix caching is already widely deployed by all providers, right? llama.cpp does it. vLLM does it. I'm sure everyone hosting LLMs for a living does it. This paper seems to focus entirely on prefixes (i.e. the prefilled content is rooted at 0). This is... nothing.

The referenced CacheBlend paper (https://arxiv.org/pdf/2405.16444) which tries to stitch together multiple independent prefills looks more interesting and is new to me. The problem it's trying to solve is:

* KV projections for a given token at a given layer are a function of the residual at that layer,

* which is a function of the attention contribution of the previous layer,

* which is a (nonlinear) function of all earlier tokens' KV values at the previous layer.

This is what stops you from just pasting KV blocks together. Intuitively it might feel like you could do the equivalent of an MPEG deblocking filter to fix up the edges, but there's no guarantee the tokens that need fixing up are at the beginning of the KV block, so they have to be sneaky about it.

Unfortunately while that paper is quite verbose it's lacking in detail in the most important part: how they perform the approximate KV recomputation. It looks like the rough idea is that they fully recompute the KV for the first layer, and use the deviation between the recompute and the original cached KV as a heuristic for how important it is to recompute the full KV values (i.e. all remaining layers) for that token. They use that to derive a mask for the tokens which most strongly attend to the earlier context, then do a sparse update of those tokens.

What's still unclear is how this actually ends up being a performance win, given that the sparse update for each token still requires the exact KV for all the prior tokens in order to actually arrive at the correct value. It just kind of recurses the problem instead of fixing it. Maybe they just use the precomputed KV for the other tokens as input, and live with the approximation?

I think this is already somewhat pragmatically solved: just don't pull huge documents into context. Give the LLM tools to search them and retrieve the fragments that are actually relevant.

dannyw•4m ago
Yeah, I'm really not sure what the point of this paper is. Every non-toy environment does prefix caching.