I built Sift, a drop-in MCP gateway that stores tool outputs as local artifacts (filesystem blobs indexed in SQLite) and returns an `artifact_id` plus compact schema hints when responses are large or paginated.
Instead of reasoning over full JSON in the prompt, the model runs a small Python query:
def run(data, schema, params):
return max(data, key=lambda x: x["magnitude"])["place"]
Query code runs in a constrained subprocess (AST/import guards + timeout/memory caps). Only the computed result is returned to the model.Benchmark (Claude Sonnet 4.6, 103 questions across 12 datasets):
- Baseline (raw JSON in prompt): 34/103 (33%), 10.7M input tokens
- Sift (artifact + code query): 102/103 (99%), 489K input tokens
Open benchmark + MIT code: https://github.com/lourencomaciel/sift-gateway
Install:
pipx install sift-gateway
sift-gateway init --from claude
Works with Claude Code, Cursor, Windsurf, Zed, and VS Code. Existing MCP servers and tools require no changes.
loumaciel•1h ago
The benchmark harness and datasets are in the repo if anyone wants to reproduce or extend the tests. Curious if others have run into the same context compaction issues with tool-heavy agents.