Output tokens: most expensive (3-5x input price) Input tokens (cache miss): full price Input tokens (cache hit): 90% discount
The agent with pre-indexed context processes more total tokens because the structured context payload is injected every turn. But the token MIX shifts dramatically: Output tokens: 10,588 → 3,965 (-63%) Cache read rate: 93.8% → 95.3% Cache creation: 6.1% → 4.6% Output tokens dominate the cost equation. When the agent receives 40K tokens of unfiltered context, it generates verbose orientation narration ("let me look at this file... I can see that..."). When it receives 8K tokens of graph-ranked context, it skips straight to the answer. 504 output tokens per task → 189. The cache effect compounds this: structured, consistent context across turns hits the cache more reliably than ad-hoc file reads that change every turn. So the additional input tokens cost almost nothing (90% discount) while the output token reduction saves the most expensive tokens. The general principle: with tiered token pricing, optimizing for total token count is wrong. You should optimize for token mix — push volume from expensive tiers (output, cache miss) to cheap tiers (cache hit). More total tokens can cost less if you shift the distribution. This seems obvious in retrospect but I haven't seen it discussed much. Most context engineering work focuses on reducing input tokens. The bigger lever might be reducing output tokens by improving input signal-to-noise ratio — the model writes less when it doesn't have to think out loud about what it's reading.
The tool is vexp (https://vexp.dev) — local-first context engine, Rust + tree-sitter + SQLite. Free tier available.
gnabgib•1h ago
> Please don't use HN primarily for promotion. It's ok to post your own stuff part of the time, but the primary use of the site should be for curiosity.
https://news.ycombinator.com/newsguidelines.html
nicola_alessi•1h ago