1. Copilots (Claude, ChatGPT, etc.)
These are being used as flexible debugging partners. Engineers drop in stack traces, logs, or config snippets and get explanations, root cause analysis, or potential fixes. They’re reactive and on-demand, but very effective for problem-solving and learning in the moment.
2. Always-on AI-SRE tools (NudgeBee, PagerDuty AIOps, BigPanda, Datadog, Dynatrace, etc.)
These integrate directly into monitoring and alerting systems. Instead of waiting for prompts, they continuously reduce alert noise, correlate events, and automate remediation. Some also focus on troubleshooting within clusters, cloud cost optimization, and CloudOps workflows.
Both models are now being used side-by-side: copilots for ad-hoc debugging and analysis, and always-on platforms for continuous coverage and operational resilience. Together, they represent a shift in how AI is embedded into reliability engineering.