I built callmux to fix this. It's an MCP proxy that sits between your agent (Claude, Codex, etc.) and any MCP server, adding parallel execution, batching, pipelining, and caching as meta-tools. Instead of 7 sequential get_issue calls, the agent makes 1 callmux_parallel call. The actual data transferred is identical. What you eliminate is the per-call overhead.
The math surprised me. For a batch of 7 operations:
Without callmux: ~525 tokens of structural overhead + ~900 tokens of intermediate reasoning = ~1,425 tokens of pollution
With callmux: ~75 tokens total
That's ~19:1 less context pollution from a 7:1 reduction in tool calls
Prompt caching helps with the cost side of re-reading previous turns, but it doesn't shrink your context window. Every intermediate reasoning turn still sits there taking up space, and compaction still triggers at the same threshold.In practice, callmux reduces tool calls to about 15% of the original count. But the context savings are larger than that ratio suggests, because you're also eliminating the intermediate reasoning between those calls, which is the biggest source of pollution.
The result: sessions last longer before hitting context limits, and the context window has less noise competing with your actual conversation.
I wrote up the full context math with diagrams here: https://longgamedev.substack.com/p/your-ai-agent-is-re-readi...
Setup is one line:
npx -y callmux -- npx -y @modelcontextprotocol/server-github
Works with Claude Code, Codex, Claude Desktop. Also supports multi-server mode, remote HTTP/SSE servers, and tool filtering.