The Problem: MCP (Model Context Protocol) is great for giving LLMs access to external tools. But if you connect multiple servers (GitHub, Linear, Postgres, Slack), you end up with 40-50k tokens of tool definitions injected into every request – before the agent even does anything.
On a 200k context model, that's 25% gone. On smaller models, it's worse. And most runs only use 1-2 tools.
The Solution: MCPlexor sits between your agent and your MCP servers. Instead of loading all tool definitions upfront:
Agent asks for a capability ("create an issue") MCPlexor routes to the right server using semantic matching Only relevant tools get exposed Result: ~500 tokens overhead instead of ~20k.
Technical Details: - Written in Go, single binary, no runtime deps - Supports both stdio and HTTP transports - Stores credentials in OS keychain (macOS Keychain, Windows Credential Manager, Linux keyring via zalando/go-keyring) - Works with Claude Desktop, Cursor, Augment Code, or any MCP-compatible client
The routing logic can run entirely locally using Ollama. No API calls, no cost, works offline.
Business Model (for transparency): - Local/Ollama users: completely free - Cloud tier (on waitlist): we run inference on efficient small models instead of Opus/Pro, pass on savings, take a margin The bet is that routing is a narrow enough task that a fine-tuned 7B model does it as well as a frontier model. Early testing suggests this works.
CLI uploads: https://github.com/arustagi101/mcplexor Install: `curl -fsSL https://mcplexor.com/install.sh | bash` Happy to answer questions about the architecture, the routing approach, or anything else.