Ask HN: How are you evaluating your LLMs in production?
2•ReDeiPirati•6h ago
Hello HN! Which tools do you use to evaluate your LLMs and agents in production?
Comments
znpy•6h ago
Sysadmin here ("cloud engineer" is what's in my contract).
> Which tools do you use to evaluate your LLMs and agents in production?
None for my work. I still use LLMs from time to time to generate boring terraform code or boring SQL queries, but I'm essentially not going to let some AI bs near the infrastructure I curate.
It's all fun and games until prod is down, or the cloud bill is 10x the previous month's bill (or both).
So unless I can blame it on the AI and take no responsibility I'm not going to let anything AI-powered near production.
znpy•6h ago
> Which tools do you use to evaluate your LLMs and agents in production?
None for my work. I still use LLMs from time to time to generate boring terraform code or boring SQL queries, but I'm essentially not going to let some AI bs near the infrastructure I curate.
It's all fun and games until prod is down, or the cloud bill is 10x the previous month's bill (or both).
So unless I can blame it on the AI and take no responsibility I'm not going to let anything AI-powered near production.