How do you catch AI agent regressions after prompt or model changes?
2•1taimoorkhan0•1h ago
Seeing a pattern where teams fix a failure in an agent, change the prompt or model a week later, and the same failure quietly comes back. Nobody catches it until a user does.
Curious how people are handling this today. Manual test cases? Evals? Logs? Nothing?
Not trying to pitch anything. Just trying to understand how widespread this is and what current approaches look like.