frontpage.

Show HN: Focused input cuts LLM output tokens by 63% bench on CC with FastAPI

2•nicola_alessi•2h ago

I built an MCP server (vexp) that pre-indexes a codebase into a dependency graph and serves only relevant code to AI coding agents. While benchmarking it, I found something I wasn't looking for. The expected results were straightforward: less input context → lower cost, fewer tool calls → faster. But the output token reduction was the surprise.

Benchmark: 7 tasks on FastAPI (the OSS repo, ~800 Python files), 3 runs/task/arm, 42 total runs, Claude Sonnet 4.6, both arms in --strict-mcp-config isolation. Without graph: ~23 tool calls, ~40K input tokens, 504 output tokens, $0.78/task With graph: ~2.3 tool calls, ~8K input tokens, 189 output tokens, $0.33/task The 58% cost reduction and 22% speed improvement were expected. The 63% output token reduction was not. When Claude gets 40K tokens of context (most irrelevant), it generates a lot of "let me look at this file... I can see that..." narration while it orients itself. When it gets 8K tokens of pre-filtered, graph-ranked context, it skips straight to the answer. The exploration filler disappears. This seems like a general property of these models: noisy input → verbose output, focused input → focused output. I'd be curious if others have observed this in different contexts.

The approach: tree-sitter AST parsing → dependency graph in SQLite → single MCP tool (run_pipeline) that takes a task description, walks the graph, returns ranked context. Full source for high-centrality pivot nodes, compact skeletons for supporting code. Savings varied by task type — code understanding tasks saved the most (-64%), bug fixes the least (-30%). Makes sense: the more exploration a task normally requires, the more waste there is to cut.

Code: the graph resolution is handwritten Rust. The MCP transport, SQLite schema, and benchmark harness were built with Claude Code (felt appropriate). The benchmark analysis scripts were 100% Claude.

Free tier at https://vexp.dev — 2K nodes, 1 repo, no time limit. Runs locally (tree-sitter + SQLite, no cloud).

Show HN: I build a free topical authority map generator for blog

Show HN: Headless Obsidian Sync Client

Show HN: VibeDiff – Blocks Claude Code from shipping breaking changes

Buckle Up for Bumpier Skies

How To Put 30 Languages Into 1.1MB – hypher, a fast hyphenation library for Rust

Prediction markets on Deutsche Bahn departure delays

AI causing programmers to work longer hours fixing bugs

Show HN: A Free, interactive API course for product managers

Qwen 3.5: best open-weight vision models, now on live video at 200ms

Voice Can Make Coding Agents Better (In Some Cases)

A Vindication of Bjorn Lomborg

Study: LLMs Able to De-Anonymize User Accounts on Reddit, Hacker News

A Soft-Landing Manual for the Second Gilded Age

Claude Code skills for modern xOS (iOS, iPadOS, watchOS, tvOS) development

How Teens Use and View AI

Three scientists who said no to Epstein

TrustLoop – Real-time policy enforcement and audit logging for AI agents

Cybersecurity Forecast 2026 [pdf]

Show HN: Interactive WordNet Visualizer-Explore Semantic Relations as a Graph

How to Manage Team Offsites Across Multiple Departments Without Micromanaging

Clud – super light-weight tool to turn natural language to terminal commands

Log messages are mostly for the people operating your software

A Race Within a Race: Exploiting CVE-2025-38617 in Linux Packet Sockets

So long, and thanks for all the logs

Computer Use Protocol – AI agents can perceive and interact with any desktop UI

Why we love Vim (2021) [audio]

Show HN: Limabean – a new implementation of Beancount in Clojure/Rust

Light-responsive porous aromatic frameworks manipulate CO2 uptake

Tech Legend Stewart Brand on Musk, Bezos and His Extraordinary Life

GoodSeed: A beautiful ML experiment tracker