What’s been the biggest cost multiplier in your prod LLM systems, and what policies worked (caps, degraded mode, fallback, hard fail)?
What’s been the biggest cost multiplier in your prod LLM systems, and what policies worked (caps, degraded mode, fallback, hard fail)?
Fanout × retries is the classic “bill exploder”, and P95 context growth is the stealth one. The point of “budget as contract” is deciding in advance what happens at limit (degraded mode / fallback / partial answer / hard fail), not discovering it from the invoice.
teilom•1h ago