Hi HN! I built Autocache, an intelligent proxy for the Anthropic Claude API that automatically reduces costs by up to 90% and latency by up to 85%.
**The Impact:**
If you're spending $100/day on Claude API calls with system prompts and tools, Autocache can reduce that to ~$10/day with zero code changes. For a 1000-token system prompt reused across requests, you pay 1.25× once to cache it, then 0.1× on every
subsequent request.
**The Problem:**
Anthropic's Prompt Caching requires manually placing cache breakpoints in your API requests. For applications like n8n workflows, Flowise chatbots, or any complex integration with system prompts, tools, and conversation history, you either can't
access the request structure to optimize it, or doing so manually is extremely tedious.
**How Autocache Works:**
It's a transparent drop-in proxy. For each request, it:
1. Analyzes token counts across system prompts, tools, and message content
2. Calculates ROI scores for potential cache breakpoints (write costs vs. read savings)
3. Automatically injects cache-control fields at optimal positions
4. Returns X-Autocache-* headers showing projected savings and break-even points
**Perfect for:**
- n8n AI workflows (change base URL in Claude node)
- Flowise chatbots (configure HTTP endpoint)
- LangChain/LlamaIndex apps
- Custom Claude integrations
- Any app where you can't manually optimize prompts
**Try it in 30 seconds:**
```bash
docker run -d -p 8080:8080 -e ANTHROPIC_API_KEY=sk-ant-... ghcr.io/montevive/autocache:latest
Point your app to http://localhost:8080/v1/messages – check response headers for actual savings metrics on your workload.
GitHub: https://github.com/montevive/autocache
I've tested this with n8n workflows and seen $200→$25/day cost reductions on production workloads. The ROI algorithm uses conservative estimates, but I'd love feedback on edge cases or strategies I haven't considered.
Tech: Go, ~29MB Docker image, multi-arch, MIT licensed.
jmrobles•2h ago