Show HN: TokenShield – Local proxy that dedupes Claude Code conversation traffic

https://www.npmjs.com/package/@curatedmcp/tokenshield

1•curatedmcp•34m ago

I built a local proxy that dedupes Claude Code traffic. TokenShield — cuts your Claude Code bill 40-70%

Comments

curatedmcp•33m ago

I run Claude Code most days and my Anthropic bill kept creeping up without me understanding which conversations were the expensive ones. A 25-turn agentic session re-reads `auth.ts` five times and re-runs `gh pr list` three times — every duplicate ships as a fresh tool_result content block to the model, every time. The model already saw identical bytes two turns ago, but it doesn't matter; you pay for them again.

TokenShield is a small Node 22 proxy that sits in front of api.anthropic.com. Set `ANTHROPIC_BASE_URL=http://127.0.0.1:7777` and Claude Code (or Cursor, Windsurf, Zed, Aider — anything that uses the Anthropic SDK) routes through it. The proxy hashes every tool_result content block per-conversation; subsequent occurrences of the same hash within the same conversation are replaced with a deterministic stub:

  { "type": "tool_result",
    "tool_use_id": "...",
    "content": "[tokenshield: identical content seen at message 2, sha:1f6063fe]" }

The deterministic part matters — same input always produces the same output, so Anthropic's prompt_cache (cache_control) stays valid.

The accounting side was the bug-hunt I didn't expect. First version captured 0 tokens on successful responses. Turned out Anthropic returns Content-Encoding: gzip and I was running JSON.parse on the compressed buffer. The silent try/catch swallowed the error and I shipped "200 status, $0.00 spent" for a week before noticing. Fix: `accept-encoding: identity` on upstream — bandwidth on localhost doesn't matter and reliable measurement does.

Real numbers from my own usage yesterday: one Opus-4 turn paid $0.1719, saved $0.0747 — 30.3% on that turn. Bench numbers across three recorded fixtures (light Q&A, medium coding, heavy 25-turn agentic): 0%, 27.7%, 62.1%. The light case correctly does nothing — there's nothing to dedup in a 5-turn conversation with no tool use.

What's deliberately not here:

- It only helps API-billed clients. Claude Desktop and claude.ai web are flat-subscription / hardcoded endpoint — no metered bill to compress. - OpenAI / Gemini are on a waitlist. The provider adapter interface is in the code, but I'd rather ship one provider correctly than three half-baked. - Compression is conservative (>= 256 byte payloads, content-hash keyed). It does not touch streaming SSE bytes — passthrough is byte-faithful so the prompt cache stays valid.

The piece I'd most appreciate feedback on: I'm staring at `response-cache` as the next processor (LRU+TTL on (model, system_prompt_hash, last_user_msg_hash)), but the correctness story scares me — `temperature > 0` and tool-using conversations both make caching risky. Would love thoughts from anyone who has done semantic caching on LLM responses about where the landmines are.

`npm i -g @curatedmcp/tokenshield` to try it. MIT, self-hosted, your API key never leaves the machine, telemetry is opt-in and aggregate-only.

More at: https://curatedmcp.com/tokenshield

Depression linked to bacterium-chemical interaction in personal care products

The Sunk Cost Fallacy and How It Influences Our Decisions

Andrej Karpathy Joins Anthropic

Google Antigravity CLI

Google introduces Gemini Spark, a 24/7 agentic assistant with Gmail integration

Show HN: Logbox – let Claude monitor your dev logs

Likely AI-generated short story won a major prize

Show HN: Melogen – Generate MIDI melodies for free

Show HN: FastBack end – schema-first back end runtime with OpenAPI output

The Gemini app becomes more agentic, delivering proactive, 24/7 help

Disney Erased FiveThirtyEight

Which campaigns actually drive your leads?

Show HN: Coding agent where a second agent QAs every PR in a real browser

The missing men of the American marriage market

Scientists worried about de-extinction ethics as biotech co. touts breakthrough

Automate your computer using real code – not drag-and-drop blocks

The Trouble with Emotion AI

Lapdog: Local Coding Agent Assistant

Ruby vs. Java vs. TypeScript: Building Claude Cowork Docx Plugin

Mistral AI Python package compromised on PyPI [2026-05-12]

Finding Unpinned and Unpinnable GitHub Actions Across Your Org

From Compute Overhang to Compute Crunch

Chrome Dev Blog: Declarative Partial Updates (Interleaved HTML Streaming)

Show HN: Search 67K .AI domains by AI-extracted tags and descriptions

Gemini Omni Flash is coming soon

A case against the case against full-body MRI screening

TinyFish Vault: Your Web Agent Can Now Log in Without Touching Your Passwords

AI slop is flooding maths YouTube [video]

Google pushes update to Antigravity instead it reinstalls and locks everyone out

The TTY Demystified (2008)