As AI gets more involved in operations, I’ve been thinking a lot about how people are actually handling this in real systems.
Most ops setups I’ve seen are already fragmented: incidents in PagerDuty, metrics in Prometheus or Datadog, logs elsewhere, coordination in Slack, automation in scripts or internal tooling. Introducing AI into that mix usually means either one-off integrations or giving agents fairly broad access to systems that weren’t designed for it.
I’m curious how people here are approaching this in practice:
- Are you letting AI interact directly with ops tools? - Are you building an internal orchestration layer or control plane? - Are these workflows still mostly human-driven with AI as a helper? - Or is this a problem you’re intentionally avoiding for now?
I’ve been exploring this space myself and would really value hearing how others are thinking about it, especially what’s worked and what’s turned out to be more trouble than it’s worth.
Thanks.