frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

More tokens, less cost: why optimizing for token count is wrong

1•nicola_alessi•1h ago
I ran a controlled benchmark on AI coding agents (42 runs, FastAPI, Claude Sonnet 4.6) and found something that broke my mental model of LLM costs. The setup: I built an MCP server that pre-indexes a codebase into a dependency graph and serves pre-ranked context to the agent in a single call, instead of letting the agent explore files on its own. The expected result: less input context → lower cost. Straightforward. The actual result: total tokens processed went UP 20% (23.4M vs 19.6M) while total cost went DOWN 58% ($6.89 vs $16.29). The explanation is in how Anthropic prices tokens. There are three pricing tiers:

Output tokens: most expensive (3-5x input price) Input tokens (cache miss): full price Input tokens (cache hit): 90% discount

The agent with pre-indexed context processes more total tokens because the structured context payload is injected every turn. But the token MIX shifts dramatically: Output tokens: 10,588 → 3,965 (-63%) Cache read rate: 93.8% → 95.3% Cache creation: 6.1% → 4.6% Output tokens dominate the cost equation. When the agent receives 40K tokens of unfiltered context, it generates verbose orientation narration ("let me look at this file... I can see that..."). When it receives 8K tokens of graph-ranked context, it skips straight to the answer. 504 output tokens per task → 189. The cache effect compounds this: structured, consistent context across turns hits the cache more reliably than ad-hoc file reads that change every turn. So the additional input tokens cost almost nothing (90% discount) while the output token reduction saves the most expensive tokens. The general principle: with tiered token pricing, optimizing for total token count is wrong. You should optimize for token mix — push volume from expensive tiers (output, cache miss) to cheap tiers (cache hit). More total tokens can cost less if you shift the distribution. This seems obvious in retrospect but I haven't seen it discussed much. Most context engineering work focuses on reducing input tokens. The bigger lever might be reducing output tokens by improving input signal-to-noise ratio — the model writes less when it doesn't have to think out loud about what it's reading.

The tool is vexp (https://vexp.dev) — local-first context engine, Rust + tree-sitter + SQLite. Free tier available.

Comments

gnabgib•1h ago
You're over doing the self-promotion (this is the 7th time you've submitted vexp), share something with us you're curious about that you didn't build.

> Please don't use HN primarily for promotion. It's ok to post your own stuff part of the time, but the primary use of the site should be for curiosity.

https://news.ycombinator.com/newsguidelines.html

nicola_alessi•1h ago
Fair point, appreciate the callout. I'll dial it back.
alexbuiko•1h ago
This is a brilliant breakdown of the 'Token Mix' paradox. It aligns perfectly with what we’ve been seeing while developing SDAG.

When you optimize for a structured context payload (like your dependency graph), you aren't just hitting the Anthropic pricing cache—you are literally reducing the routing entropy at the inference level. High-noise inputs force the model into 'exploratory' output paths, which isn't just expensive in dollars, but also in hardware stress.

We found that 'verbose orientation narration' (the thinking-out-loud part) correlates with higher entropy spikes in memory access. By tightening the input signal-to-noise ratio, you're essentially stabilizing the model's internal routing. Have you noticed any changes in latency variance (jitter) between the pre-indexed and ad-hoc runs? In our tests, lower entropy usually leads to much more predictable TTFT (Time To First Token).

nicola_alessi•1h ago
Interesting framing — hadn't thought about it from the inference routing angle but it maps well to what the data shows. On latency variance: yes, significantly. Cost standard deviation across runs dropped 6-24x depending on task type. The most extreme case was a refactoring task: baseline sigma $0.312 vs $0.013 with pre-indexed context. Duration variance also dropped in 6 out of 7 tasks. I didn't measure TTFT specifically but the overall duration went from 170s → 132s with much tighter clustering around the mean. The stabilization effect is probably the most underrated finding. Everyone focuses on the average cost reduction, but the predictability improvement matters more for production workloads — you can actually forecast spend instead of hoping the agent doesn't go on an exploration tangent. What's SDAG? Curious about your setup.

Inverse Occam's Razor

https://arxiv.org/abs/2204.08284
1•jerlendds•1m ago•0 comments

Tell HN: Apple development certificate server seems down?

2•strongpigeon•1m ago•0 comments

Mother of All Grease Fires

https://milk.com/wall-o-shame/bucket.html
1•xk3•1m ago•0 comments

6-Axis Milling for Enhancing Quality of Fused Granular Fabrication Parts

https://www.mdpi.com/2073-4360/18/5/608
1•PaulHoule•2m ago•0 comments

Working to Decentralize FedCM

https://atproto.com/blog/working-to-decentralize-fedcm
1•sgoto•2m ago•0 comments

Agent-sync – sync between Claude Code and Codex configs

https://github.com/matanabudy/agent-sync
1•matanabudy•3m ago•0 comments

Helix 02 living room tidy

https://www.youtube.com/watch?v=CAdTjePDBfc
1•hheikinh•4m ago•0 comments

Don't let LLMs write for you

https://justismills.substack.com/p/dont-let-llms-write-for-you
1•c-oreills•5m ago•0 comments

Deep Learning: Our Year 1990-1991

https://people.idsia.ch/~juergen/deep-learning-miraculous-year-1990-1991.html
1•untilted•7m ago•0 comments

Ask HN: I built an AI-native codebase framework–could you evaluate it?

1•xodn348•11m ago•1 comments

The Slowest Viral Thing

https://pilgrima.ge/p/the-slowest-viral-thing
1•momentmaker•12m ago•0 comments

SoftBank eyes up to $40B loan to fund OpenAI investment

https://www.reuters.com/business/media-telecom/softbank-seeks-up-40-billion-loan-finance-openai-i...
4•devonnull•12m ago•0 comments

SEIA Solar Market Insight Report 2025 Year in Review

https://seia.org/research-resources/us-solar-market-insight/
1•toomuchtodo•13m ago•0 comments

A vertical tab companion app for aerospace window manager

https://github.com/raghavendra-talur/aeromux
1•rtalur•14m ago•1 comments

Uber rolls out women-only option in the US

https://www.bbc.com/news/articles/cx2gvrzwdr7o
2•alephnerd•14m ago•0 comments

Meta Is Buying Moltbook

https://lifehacker.com/tech/meta-is-buying-moltbook
1•umangsehgal93•14m ago•0 comments

GoT Timeline – a daily timeline game to test your Game of Thrones skills

https://www.got-timeline.com
1•onion92•14m ago•0 comments

Claude Code makes local LLMs 90% slower

https://unsloth.ai/docs/basics/claude-code
4•telotortium•18m ago•1 comments

Eventbrite Enters into Definitive Agreement to Be Acquired by Bending Spoons

https://www.businesswire.com/news/home/20251202408560/en/Eventbrite-Enters-into-Definitive-Agreem...
5•DocFeind•19m ago•0 comments

Why doesn't V8 fit on my microcontroller? (2021)

https://medium.com/the-toit-take/why-doesnt-v8-fit-on-my-microcontroller-71dc6e2d8f5c
1•tosh•20m ago•0 comments

Is there an MD5 Fixed Point where MD5(x) == x?

https://stackoverflow.com/questions/235785/is-there-an-md5-fixed-point-where-md5x-x
2•plaguna•21m ago•0 comments

GPT-4 leaks its own API internals through training data exposure

1•safteylayer•22m ago•0 comments

Andrew Tate Doesn't Get the Point of Books

https://www.theatlantic.com/ideas/2026/03/slow-reading-books-benefits/686266/
2•paulpauper•23m ago•2 comments

The Met Opera's Desperate Hunt for Money

https://www.nytimes.com/2026/03/08/arts/met-opera-peter-gelb-finances.html
3•paulpauper•24m ago•0 comments

Bubbles, Booms and Crashes in the US Stock Market 1792-2024

https://www.nber.org/papers/w34903
3•paulpauper•24m ago•0 comments

Ask HN: How are you reviewing code at work these days?

1•curiousgal•26m ago•1 comments

Professors scramble to save critical thinking in an age of AI

https://www.theguardian.com/technology/ng-interactive/2026/mar/10/ai-impact-professors-students-l...
2•fallinditch•27m ago•0 comments

Senate Moves Toward Passing Housing Bill, but Challenges Lie Ahead

https://www.nytimes.com/2026/03/10/us/politics/senate-housing-bill.html
1•JumpCrisscross•29m ago•0 comments

Show HN: Zehrava Gate – an control plane that sits between AI agents and prod

1•cgallic•32m ago•0 comments

Single-dose treatment approved for sleeping sickness

https://www.nature.com/articles/d44148-026-00051-w
4•bookofjoe•33m ago•0 comments