Running several projects that collectively hit $2k+/mo in API costs across OpenAI, Anthropic,& AWS Bedrock. Started doing monthly audits then found I was overspending by about 60%.
Biggest wins so far:
Model routing cut costs 55% with no quality loss on final output
Prompt compression saved 70% on my most called endpoint
Request deduplication on retries eliminated 15% of wasted calls
Caching semantically similar queries knocked out another 20-30%
But I feel like I'm still missing things, especially on the infrastructure side (GPU instance sizing, spot vs. on-demand, etc).
So what tools or approaches are others using? Is anyone doing this systematically or is everyone just eyeballing their dashboards? Let me know!