Ask HN: How are you estimating API costs before committing to an architecture?

1•sarthakaggarwal•2h ago

I've been building agent workflows on top of Claude's API and keep running into the same problem: I can't predict what a feature will cost until I've already built it and run it at some scale.

The input side is manageable — you can count tokens before sending. But output tokens are essentially unknowable upfront, and with agents that chain multiple calls (tool use, multi-turn reasoning, retries on failure), a single user action might be 3 API calls or 40. Multiply that by prompt caching behavior (which is great when it hits, but you can't always guarantee it), and the cost variance per task can easily be 10-20x.

This makes it really hard to do basic things like: set pricing for an AI-powered feature, decide whether an approach is even economically viable before building it, or give finance any kind of credible forecast.

What I've tried/looked at so far:

- Anthropic's token counting endpoint gives you exact input token counts pre-flight, which helps, but doesn't solve the output/chaining problem

- Logging everything post-hoc and building up averages per workflow — works but you're already committed by that point

- Setting hard spend caps at the API level — blunt instrument, doesn't help with per-feature attribution

- Looked at various OSS tools (ccusage, Langfuse, Helicone) — mostly retrospective dashboards, good for what did I already spend but not what will I spend

How are you handling this, especially if you're running agent-heavy workloads or building products where AI cost is a meaningful part of COGS. Are you doing any kind of pre-flight estimation? Cost-aware routing between models? Or just building first and optimizing later?

Comments

verdverm•2h ago

I stay relatively human-in-the-loop to keep costs down and quality up. My custom agent setup also has some "weird tricks" to keep context size down. Starting with a cheaper model like gemini-3-flash has good ROI too.

Claude Code skills for modern xOS (iOS, iPadOS, watchOS, tvOS) development

How Teens Use and View AI

Three scientists who said no to Epstein

TrustLoop – Real-time policy enforcement and audit logging for AI agents

Cybersecurity Forecast 2026 [pdf]

Show HN: Interactive WordNet Visualizer-Explore Semantic Relations as a Graph

How to Manage Team Offsites Across Multiple Departments Without Micromanaging

Clud – super light-weight tool to turn natural language to terminal commands

Log messages are mostly for the people operating your software

A Race Within a Race: Exploiting CVE-2025-38617 in Linux Packet Sockets

So long, and thanks for all the logs

Computer Use Protocol – AI agents can perceive and interact with any desktop UI

Why we love Vim (2021) [audio]

Show HN: Limabean – a new implementation of Beancount in Clojure/Rust

Light-responsive porous aromatic frameworks manipulate CO2 uptake

Tech Legend Stewart Brand on Musk, Bezos and His Extraordinary Life

GoodSeed: A beautiful ML experiment tracker

Avery Is Different: You Don't Vibe Code. You Work with an AI Virtual Engineer

Show HN: give names/icons to Mac Spaces, jump between and track time across them

Voxile: A ray traced game made in its own engine and programming language

Xkcd-2501-Skill.md

Curiosity rover captures Martian spiderwebs up close

Ask HN: Can hash verification replace EV code-signing on Windows?

Cursor discovered a novel solution to Problem Six of the First Proof challenge

Aegis - A safe, auditable, replayable agentic guardrails framework

ChatGPT, write me a fictional paper: LLMs are willing to commit academic fraud

Sentinel Defense Technologies – Remote (Global) – Equity → Pre-Seed in 6-10 Wks

Show HN: Noclaw

Waiting for the Barbarians

A Rational Analysis of the Effects of Sycophantic AI