Ask HN: Opus 4.7 – is anyone measuring the real token cost on agentic tasks?
1•nicola_alessi•1h ago
Shipped today. The benchmarks are real: 87.6% SWE-bench (from 80.8%), +13% on coding tasks, 3x more resolved production tasks on Rakuten-SWE-Bench.
But there are a few changes that compound on each other for token consumption: a new tokenizer (1.0–1.35x more tokens for the same input depending on content type), xhigh effort mode that reasons longer per turn, and /ultrareview which spins up parallel agents for code review.
The model is genuinely better. The question is whether "better reasoning on the same noisy context" is the right optimization. Coding agents still spend most of their budget exploring files they won't touch before doing anything useful — a smarter model doesn't fix that, it just does it more thoroughly.
Curious what cost delta people are seeing on real codebases moving from 4.6 to 4.7, especially with the new effort levels.
(I've been building in this space — vexp.dev is my attempt at the context side of the problem, happy to share more details if useful).