I’ve been experimenting with an idea that I’m honestly not sure is brilliant or completely reckless.
Tools like Claude, Cursor, and Copilot can already:
read files
run terminal commands
edit code
And they’re incredibly useful for development work.
It made me wonder: what would the equivalent look like for infrastructure engineers?
I’m prototyping an “AI coworker” that can:
read logs
run shell commands
inspect system state
check Kubernetes
read/edit config files
query internal APIs
The goal isn’t a chatbot. The goal is this:
You say: “The API is failing. Find out why and fix it.”
And the agent goes through the same loop an SRE would:
observe → hypothesize → run commands → verify → fix.
But this raises a lot of uncomfortable questions.
Cursor/Claude can technically already run commands if you let them — so why is this a bad idea? Or is it?
I’m trying to understand the boundary between:
“This would be insanely useful for debugging and ops”
and
“This is how you take down production at 3am”
Before I go too far building this, I’d really like to hear from people who run real systems:
Would you ever try something like this?
Where would this be useful vs unacceptable?
What safeguards would you absolutely require?
What tasks would you want this for?
What makes this fundamentally different from just giving Cursor terminal access?
I’m early, testing this only on a local docker-compose setup with a few services. Just trying to sanity-check the idea with people who’ve been on call.
Bender•1h ago
I would not, most legal departments would not, all CSO's and compliance officers would not if someone explained it to them honestly. I have no doubt some will be tricked into approving such a thing and will try to back-peddle when it backfires on them.
Would you ever try something like this?
No I would not but I have only worked for companies with highly sensitive data, financial data, credit card data, proprietary code and data.
What safeguards would you absolutely require?
The entire AI stack would need to be written and maintained by the same company it is running in and all of the data must be stored in that companies data-centers. The interface must be behind multi-factor authentication and a corporate VPN running in the data-center. It would need to be audited by internal auditors, red team pen testers, external 3rd party code and infrastructure pen-testers and would have to go through the strictest change control. Every action by the AI must be highly audited real time and every action must be predictable and reproducible. No third party connections whatsoever. Any attempts to connect outbound must trigger and immediate mandatory all hands on deck response. The entire stack both client, agent and servers must run entirely within the data-center and not someones laptop regardless of how locked down their workstation or laptop is.
And that is even before factoring risks such as hallucinations, confidently accepting its own incorrect decisions. Blaming the AI for downtime, leaking customer data, leaking intellectual property would not be acceptable.
Having said all that, I am certain there will be some interested that could get it approved. Some companies give Okta root access via an agent to all their server fleets with no local guardrails. Should they ever get hacked that is insta-root on a lot of servers. My opinions on that matter are not suitable for public forums.