Ask HN: How are people forecasting AI API costs for agent workflows?

4•Barathkanna•5h ago

I’ve been experimenting with agent-based features and one thing that surprised me is how hard it is to estimate API costs.

A single user action can trigger anywhere from a few to dozens of LLM calls (tool use, retries, reasoning steps), and with token-based pricing the cost can vary a lot.

How are builders here planning for this when pricing their SaaS?

Are you just padding margins, limiting usage, or building internal cost tracking? Also curious, would a service that offers predictable pricing for AI APIs (like a fixed subscription cost) actually be useful for people building agentic workflows?

Comments

clearloop•5h ago

imo switch to local models could be an option

Barathkanna•4h ago

Local models solve the marginal cost problem, but they move the complexity into infrastructure and throughput planning instead.

clearloop•2h ago

makes sense, it really depends on the use cases, I'm building my version of claw openwalrus for the local LLMs first goal, I think myself will use local models for daily tasks that heavily depend on tool callings, but for coding or doing research, I'll keep using remote models

and this topic actually inspires me that I can introduce a builtin gas meter for tokens

Lazy_Player82•4h ago

Honestly, if you're designing your agent workflows properly with hard limits on retries and tool calls, the variance shouldn't be that wild. Most of the unpredictability comes from not having those guardrails in place early on. A few weeks of real production data usually shows the average cost is more stable than you'd expect.

Barathkanna•4h ago

True, but for early stage builders it’s harder to design those guardrails upfront. A lot of the time you only discover the retry patterns and cost spikes once real users start hitting the system.

Lazy_Player82•4h ago

Fair point. And honestly, with more non-technical builders shipping agent-based products these days, that's probably where a service like this makes the most sense – for people who don't yet have the experience to know what guardrails to put in place.

Barathkanna•4h ago

Exactly. That’s actually why we started building Oxlo.ai. Early stage builders usually just want to experiment without worrying too much about token cost spikes.

sriramgonella•4h ago

local models are better in controlling costs rather commercial models are very high and no control on this cost..how ever again local models training setup to be archietected very well to train this continoulsly

thiago_fm•1h ago

That isn't true, if you run local models you'll also need to have to spend on operations.

Maybe focus first on providing value and later you can optimize this setup.

thiago_fm•1h ago

Just add very hard high limits and add instrumentation so you can track it and re-evaluate it accordingly.

This takes a couple of hours maximum at best.

Tell HN: Apple development certificate server seems down?

Ask HN: How are people doing AI evals these days?

Ask HN: What Are You Working On? (March 2026)

Ask HN: Remember Fidonet?

Ask HN: How are people forecasting AI API costs for agent workflows?

Maybe we can keep on coding? pseudo code project

Ask HN: How to be alone?

Ask HN: Please restrict new accounts from posting

Ask HN: Most beautiful personal blog UI you have ever seen?

Ask HN: How do you review gen-AI created code?

Tell HN: Vertical tabs has arrived (behind a flag) in Chrome stable

Ask HN: Is Starlink still being jammed in Iran?

Ask HN: Can I repurpose a Bluetooth voice remote as input device for a PC?

Ask HN: How to "make it" as a newlygrad/junior?

Tell HN: I'm 60 years old. Claude Code has re-ignited a passion

Ask HN: Devs who are out of work – what are you doing for income now?

Ask HN: Does automatic multilingual support make sense for a launch platform?

Why is GPT-5.4 obsessed with Goblins?

Unlocked SaaS, file source as truth?

Ask HN: Is GitHub getting less reliable, or is it just me?

The Architecture of an Exit Scam: A Technical Audit of Zszrun

Ask HN: Since a week HN keeps logging me off every few days, why?

Ask HN: What AI content automation stack are you using in 2026?

Ask HN: Do you still run Redis and workers just for background jobs?

Ask HN: Favorite Non-Spammy iPhone Games?

Ask HN: What game engine would you recommend for vibe coding?

Ask HN: Read‑only LLM tool for email triage and knowledge extraction?

Ask HN: Any informed guesses on the actual size/architecture of GPT-5.4 etc.?

Ask HN: Anyone else feel this community has changed recently?

Code-review-graph: persistent code graph that cuts Claude Code token usage

Tell HN: Apple development certificate server seems down?

Ask HN: How are people doing AI evals these days?

Ask HN: What Are You Working On? (March 2026)

Ask HN: Remember Fidonet?

Ask HN: How are people forecasting AI API costs for agent workflows?

Maybe we can keep on coding? pseudo code project

Ask HN: How to be alone?

Ask HN: Please restrict new accounts from posting

Ask HN: Most beautiful personal blog UI you have ever seen?

Ask HN: How do you review gen-AI created code?

Tell HN: Vertical tabs has arrived (behind a flag) in Chrome stable

Ask HN: Is Starlink still being jammed in Iran?

Ask HN: Can I repurpose a Bluetooth voice remote as input device for a PC?

Ask HN: How to "make it" as a newlygrad/junior?

Tell HN: I'm 60 years old. Claude Code has re-ignited a passion

Ask HN: Devs who are out of work – what are you doing for income now?

Ask HN: Does automatic multilingual support make sense for a launch platform?

Why is GPT-5.4 obsessed with Goblins?

Unlocked SaaS, file source as truth?

Ask HN: Is GitHub getting less reliable, or is it just me?

The Architecture of an Exit Scam: A Technical Audit of Zszrun

Ask HN: Since a week HN keeps logging me off every few days, why?

Ask HN: What AI content automation stack are you using in 2026?

Ask HN: Do you still run Redis and workers just for background jobs?

Ask HN: Favorite Non-Spammy iPhone Games?

Ask HN: What game engine would you recommend for vibe coding?

Ask HN: Read‑only LLM tool for email triage and knowledge extraction?

Ask HN: Any informed guesses on the actual size/architecture of GPT-5.4 etc.?

Ask HN: Anyone else feel this community has changed recently?

Code-review-graph: persistent code graph that cuts Claude Code token usage

Ask HN: How are people forecasting AI API costs for agent workflows?

Comments