I would want something like:
- monitors logs and metrics 24/7
- detects anomalies or issues and starts an autonomous investigation
- has access to your repo, recent commits, code, and other tools needed to trace down a root cause
There seem to be products like Resolve AI and Datadog’s Bits AI SRE in this direction. I’m curious if anyone here has tried these or similar tools, and what your experience was — good, bad, or overhyped.
For people running production-grade systems (especially small–medium companies): What tasks would you actually want an AI SRE to handle, and what would you trust it with?
Not trying to pitch anything, I'm just curious about pain points from people who’ve been on-call outside of big tech.