Things like:
- loops that don’t terminate cleanly
- retries cascading across tool calls
- cost creeping up inside a single workflow
- agents making technically “allowed” but undesirable calls
Monitoring here is fine. We can see what’s happening. The harder part is deciding where the enforcement boundary actually lives.
Right now, most of our shutdown paths still feel manual, things like feature flags, revoking keys, rate limiting upstream, etc.
Curious how others are handling these problems in practice:
- What’s your enforcement unit? Tool call, workflow, container, something else?
- Do you have automated kill conditions?
- Did you build this layer internally?
- Did you have to revisit it multiple times as complexity increased?
- Does it get worse as workflows span more tools or services?
Would appreciate any concrete experiences from teams running agents in production. Really just trying to figure out how to scale.