https://github.com/coollabsio/llmhorrors.com/blob/main/CLAUD...
The whole website seems to be focused on promoting the author and their projects more than sharing the information. Just link to the original.
https://www.reddit.com/r/googlecloud/comments/1reqtvi/82000_...
Posted to HN twice recently.
There's also a practice that primarily seems to occur in china where stolen keys are resold via proxy services. A single key can provide access to thousands of users, racking up costs very fast (again, assuming the rate limits are high enough).
its googles blunder that they allowed public tokens to be used for paid functionality.
But for those stupid API keys the corporations have zero excuse not to have configurable limits with a sensible default.
> Conclusion: Always set billing caps and alerts on cloud API keys.
Sadly, way easier said than done in the case of GCP. Been a proper reason for me to avoid GCP deployments with LLM use-cases for smaller projects.
I remember looking into this a while back assuming it would be a sane feature to expect. But for some reason it's surprisingly non-trivial with GCP to set budgets. Especially if the only thing you want is a Gemini API key with finite spending.
IIRC you could either set (rate) limits on quotas, but quotas are extremely granular (like, per region per model) meaning you need to both set tons of values and understand which quotas to relax. Or alternatively do some bubblegum-and-ducktape like solution where you build an event-driven pipeline to react to cost increases in your own project.
I understand that exact budgets are hard to enforce in real-time, especially for their more complex infra offerings.
However, (1) even if it's not exactly real-time, but instead enforced every hour that's already going to go a long way, and (2) PAYG LLM usage is billed rather linearly by the amount of tokens you use, so if there would be an easy way to set a dollar-amount and have that expressed as budgets that would already get you part of the way there.
Anyway, the current state of GCP budgeting it makes me avoid it for production usage until I'm ready to commit spending significant effort to harden it. For all small projects, the free tier tokens are a safe bet, but their extremely low rate-limits make them rarely a good fit.
user34283•1h ago
As far as I saw you can only set up billing alerts, no hard limit.
rustyhancock•1h ago
lima•1h ago
sofixa•1h ago
Iolaum•1h ago
delfinom•1h ago
johndough•1h ago
Forgeties79•1h ago
jimnotgym•59m ago
'By the way old chap, you have gone over your storage limit. Do you want to buy more or delete some stuff?'
kleene_op•47m ago
Why does my AWS counselor sound British. Am I in eu-west-2?
jimnotgym•8m ago
Someone1234•1h ago
Therefore, they've implemented hard-limits. So not offering hard-limits is a business decision, NOT a technical one. They're essentially hiding functionality they have.
Make of that as you will. Anyone justifying it, should be me with skepticism.
akdev1l•52m ago
There is a free tier but that varies per service and anyway will not limit anything. It works as if it just gives you some credit to offset the costs.
Someone1234•38m ago
[0] https://www.geeksforgeeks.org/cloud-computing/aws-educate-st...
They also offered (may still offer) the same thing with AWS Academy.
PunchyHamster•51m ago
shawabawa3•1h ago
you can set up a cloud function to monitor billing limits and automatically disable billing for a project if it exceeds the limits though
kevin42•1h ago
https://docs.cloud.google.com/billing/docs/how-to/control-us...
Google Cloud is easy to set up soft budget alerts via email though, something that I had to use third party service for with AWS.
jsheard•55m ago
horsawlarway•56m ago
There are several, rather tedious and incomplete, hacks that you can apply to attempt to prevent billable actions after limits are hit.
But to be frank - they're cop-outs for a real spending cap.
You'd hope these companies would address this themselves - but it's not profitable for them to resolve (it's somewhat involved and requires them to allow people to pay them less)... So my strong vote is to make the contracts that allow this sort of "un-cappable" spending for automated actions void in court.
enginous•43m ago
This has been a major reason why I reach for OpenAI models before Gemini, but also why I'd rather use services like RunPod for training jobs. For a small boostrapped company like mine, it feels terrifyingly easy to rack up a company-ending AI bill.
The cloud companies try to limit these accidents through cranking your quotas down to nothing, but this also means that my small company can't just whip up 8xH100 easily without major ceremony, and I have routinely been rejected the GPUs quotas I needed for projects.
Accidentally leaving that kind of node on for the 24 hours that it might take to get an alert would rack up a $2,000+ bill, compared to $500 on RunPod, which will also stop the instance when you run out of money.
I've loved working with major cloud providers at growing VC-funded startups that have credits, TAMs and bigger budgets for errors. But hyperscalers are fairly difficult for a pre-scale bootstrapped business, and arguably not designed or optimized for it.
[0] https://docs.cloud.google.com/billing/docs/how-to/disable-bi... [1] https://support.terra.bio/hc/en-us/articles/360057589931-How...