Think backend pipelines like: step 1 → LLM → step 2 → LLM → step 3 where users depend on the output and nothing technically “crashes.”
We’ve seen a recurring pattern: - Same input, same prompt, same model - Works reliably for weeks - Then a constraint is ignored, or a later step contradicts an earlier one - Retries don’t reliably fix it - Logs don’t explain what changed
The hardest part isn’t bad output, it’s not being able to explain failures to PMs or stakeholders when nothing obviously broke.
Curious how others operating LLM-backed workflows in production are diagnosing or containing this kind of behavior over time.
(Not looking for prompt advice or eval frameworks. Interested in operational experiences.)
chrisjj•52m ago
Try: The known unreliability of stochastic LLM tech caused obviously predictable failure of output depended upon by the user.
Perhaps present the analogy of a random number generator feeding the calculation of a company's statutory financial accounts.