frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Prompt-caching – auto-injects Anthropic cache breakpoints (90% token savings)

https://prompt-caching.ai/
40•ermis•2h ago

Comments

spiderfarmer•1h ago
Will this work for Cowork as well?
stingraycharles•1h ago
This is not at all an MCP server you want to use with a regular tool, as this is about low level context window management. Tbh it’s really trivial to do this, and I have no idea why OP decided to make an MCP server for this as it’s completely useless for that.

As a matter of fact, i think this is not a problem at all as Anthropic makes it extremely easy to cache stuff; you just set your preferred cache level on the last message, and Anthropic will automatically cache it under the hood. Every distinct message is another “cache” point, eg they first compute the hash of all messages, if not found, compute the hash of all messages - 1, etc.

It’s really a non problem.

ermis•1h ago
No. Claude.ai is a consumer product — you have no access to the API layer underneath it. cache_control is an API-level feature only. This plugin works exclusively when you're making direct Anthropic API calls, either through the SDK in your own code or through MCP-compatible clients like Claude Code, Cursor, Windsurf, etc.
stingraycharles•15m ago
How would it work when you’re making Anthropic API calls? Wouldn’t an LLM have to invoke this, and as such, somehow the LLM needs to invoke this MCP tool (which is done using a tool call ie an answer from the LLM) before sending the request to Anthropic?

I am so confused why you chose an MCP server to solve this, wouldn’t a regular API at least have some merit in how it could be used (in that it doesnt require an LLM to invoke it) ?

somesnm•1h ago
Hasn't this been largely solved by auto-caching introduced recently by Anthropic, where you pass "cache_control": {"type": "ephemeral"} in your request and it puts breakpoints automatically? https://platform.claude.com/docs/en/build-with-claude/prompt...
stingraycharles•1h ago
Yes, it has, this is a non-problem, and even if it was a problem, an MCP server would most definitely be one of the worst ways to fix it.
philipp-gayret•1h ago
Looking at my own usage with claude code out of the box and nothing special around caching set up. For this month according to ccusage I have in tokens 0.2M input, 0.6M output, 10M cache create, 311M cache read for 322M total tokens. Seems to me that it caches out of the box quite heavily, but if I can trim my usage somehow with these kind of tools I'd love to know.
stingraycharles•1h ago
This is not about caching things for stuff that others built, it’s solely to modify code that you’re writing that will use Anthropic’s API endpoints.
gostsamo•1h ago
It is answered in the FAQ.
mijoharas•1h ago
I don't understand, Claude code already has automatic prompt caching built in.[0] How does this change things?

[0] https://code.claude.com/docs/en/costs

katspaugh•1h ago
> This plugin is built for developers building their own applications with the Anthropic API.

> Important note for Claude Code users: Claude Code already handles prompt caching automatically for its own API calls — system prompts, tool definitions, and conversation history are cached out of the box.

Source: their GitHub

jasonlotito•7m ago
Does anyone actually read anymore?

From the FAQ:

You're right, and it's a fair question. Claude Code does handle prompt caching automatically for its own API calls — system prompts, tool definitions, and conversation history are cached out of the box. You don't need this plugin for that.

This plugin is for a different layer: when you build your own apps or agents with the Anthropic SDK. Raw SDK calls don't get automatic caching unless you place cache_control breakpoints yourself. This plugin does that automatically, plus gives you visibility into what's being cached, hit rates, and real savings — which Claude Code doesn't expose.

> Claude Code already handles prompt caching automatically for its own API calls

Claude Code is an app. The API layer is different.

When did people start thinking that the Claude Code app and the API are the same thing?

Are these just all confused vibe coders?

fschuett•1h ago
Slightly off-topic, but I recently tested some tool and it turns out Opus is far cheaper than Sonnet, because it produces way less output tokens and those are what's expensive. It's also much slower than Opus (I did 9 runs to compare Haiku, Sonnet and Opus on the same problem). I also thought "oh, Sonnet is more light-weight and cheaper than Opus", no, that's actually just marketing.
CGamesPlay•46m ago
Claude subscriptions (strangely) have a Sonnet limit which is lower than the general model limit. Using Sonnet counts against both limits, using Opus only the general limit. So the subscriptions are discouraging Sonnet use as well.
adi_pradhan•55m ago
This is applicable only to the API from what i understand. Since claude code already caches quite aggressively (try npx ccusage)

Also the anthropic API did already introduce prompt-caching https://platform.claude.com/docs/en/build-with-claude/prompt...

What is new here?

numlocked•54m ago
As per its own FAQ this plugin is out of date and doesn’t actually do anything incremental re:caching:

> "Hasn't Anthropic's new auto-caching feature solved this?"

> Largely, yes — Anthropic's automatic caching (passing "cache_control": {"type": "ephemeral"} at the top level) handles breakpoint placement automatically now. This plugin predates that feature and originally filled that gap.

orphea•51m ago
I don't understand and I'm curious, why a dead on arrival open source tool needs a separate domain?

  Domain Name: prompt-caching.ai
  Updated Date: 2026-03-12T20:31:44Z
  Creation Date: 2026-03-12T20:27:35Z
  Registry Expiry Date: 2028-03-12T20:27:35Z
derrida•39m ago
Is it perhaps because this is for claude code but there's other tools that use anthropics api like custom agents? (some i prefer to use than claude code - e.g sketch.dev what is now called shelley at exe.dev) perhaps?
stingraycharles•22m ago
No, because this doesn’t actually “fix” any existing code. It’s only useful for helping an LLM to modify your code to adjust the caching parameters in the right place, but it doesn’t have the correct API for that.
Slav_fixflex•38m ago
Interesting – I've been using Claude heavily for building projects without writing code myself. Token costs add up fast, anything that reduces that is welcome. Has anyone tested this in production workflows?

TUI Studio – visual terminal UI design tool

https://tui.studio/
114•mipselaer•3h ago•53 comments

301M Records Exposed: The HIPAA Breach Epidemic

https://ciphercue.com/blog/hipaa-breach-epidemic-301-million-records
48•adulion•51m ago•22 comments

Bucketsquatting is (finally) dead

https://onecloudplease.com/blog/bucketsquatting-is-finally-dead
171•boyter•5h ago•83 comments

I traced $2B in grants and 45 states' lobbying behind age‑verification bills

https://old.reddit.com/r/linux/comments/1rshc1f/i_traced_2_billion_in_nonprofit_grants_and_45/
439•shaicoleman•3h ago•163 comments

Willingness to look stupid

https://sharif.io/looking-stupid
480•Samin100•4d ago•164 comments

Show HN: Algorithms and Data Structures in TypeScript – Free Book (~400 Pages)

http://amoilanen.github.io/Algorithms-with-Typescript/
21•jsontwikkeling•1h ago•3 comments

Launch HN: Spine Swarm (YC S23) – AI agents that collaborate on a visual canvas

https://www.getspine.ai/
3•a24venka•29m ago•0 comments

Okmain: How to pick an OK main colour of an image

https://dgroshev.com/blog/okmain/
85•dgroshev•3d ago•14 comments

Executing programs inside transformers with exponentially faster inference

https://www.percepta.ai/blog/can-llms-be-computers
174•u1hcw9nx•1d ago•47 comments

Qatar helium shutdown puts chip supply chain on a two-week clock

https://www.tomshardware.com/tech-industry/qatar-helium-shutdown-puts-chip-supply-chain-on-a-two-...
45•johnbarron•1h ago•33 comments

Malus – Clean Room as a Service

https://malus.sh
1340•microflash•1d ago•492 comments

E2E encrypted messaging on Instagram will no longer be supported after 8 May

https://help.instagram.com/491565145294150
26•mindracer•48m ago•1 comments

Show HN: What was the world listening to? Music charts, 20 countries (1940–2025)

https://88mph.fm/
39•matteocantiello•2d ago•10 comments

Ceno, browse the web without internet access

https://ceno.app/en/index.html?
73•mohsen1•7h ago•21 comments

What we learned from a 22-Day storage bug (and how we fixed it)

https://www.mux.com/blog/22-day-storage-bug
11•mmcclure•3d ago•0 comments

Nanny state discovers Linux, demands it check kids' IDs before booting

https://www.theregister.com/2026/03/13/opinion_os_verification/
20•jjgreen•41m ago•4 comments

“This is not the computer for you”

https://samhenri.gold/blog/20260312-this-is-not-the-computer-for-you/
644•MBCook•12h ago•257 comments

Source code of Swedish e-government services has been leaked

https://darkwebinformer.com/full-source-code-of-swedens-e-government-platform-leaked-from-comprom...
119•tavro•4h ago•103 comments

Gvisor on Raspbian

https://nubificus.co.uk/blog/gvisor-rpi5/
19•_ananos_•3h ago•3 comments

Dijkstra's Crisis: The End of Algol and Beginning of Software Engineering (2010) [pdf]

https://www.tomandmaria.com/Tom/Writing/DijkstrasCrisis_LeidenDRAFT.pdf
8•ipnon•4d ago•1 comments

Show HN: fftool – A Terminal UI for FFmpeg – Shows Command Before It Runs

https://bensantora.com/posts/fftool-ffmpeg-tui-go/
32•taskset•3h ago•21 comments

ATMs didn’t kill bank teller jobs, but the iPhone did

https://davidoks.blog/p/why-the-atm-didnt-kill-bank-teller
457•colinprince•23h ago•474 comments

Prompt-caching – auto-injects Anthropic cache breakpoints (90% token savings)

https://prompt-caching.ai/
40•ermis•2h ago•21 comments

Vite 8.0 Is Out

https://vite.dev/blog/announcing-vite8
395•kothariji•9h ago•121 comments

Bubble Sorted Amen Break

https://parametricavocado.itch.io/amen-sorting
361•eieio•20h ago•107 comments

An old photo of a large BBS (2022)

https://rachelbythebay.com/w/2022/01/26/swcbbs/
231•xbryanx•18h ago•140 comments

IMG_0416 (2024)

https://ben-mini.com/2024/img-0416
134•TigerUniversity•4d ago•26 comments

Enhancing gut-brain communication reversed cognitive decline in aging mice

https://med.stanford.edu/news/all-news/2026/03/gut-brain-cognitive-decline.html
338•mustaphah•21h ago•153 comments

Shall I implement it? No

https://gist.github.com/bretonium/291f4388e2de89a43b25c135b44e41f0
1384•breton•16h ago•503 comments

Prefix sums at gigabytes per second with ARM NEON

https://lemire.me/blog/2026/03/08/prefix-sums-at-tens-of-gigabytes-per-second-with-arm-neon/
54•mfiguiere•4d ago•8 comments