Ask HN: Would you trust an AI coworker with shell access to your infrastructure?

2•doctornemesis•2w ago

Hi HN,

I’ve been experimenting with an idea that I’m honestly not sure is brilliant or completely reckless.

Tools like Claude, Cursor, and Copilot can already:

read files

run terminal commands

edit code

And they’re incredibly useful for development work.

It made me wonder: what would the equivalent look like for infrastructure engineers?

I’m prototyping an “AI coworker” that can:

read logs

run shell commands

inspect system state

check Kubernetes

read/edit config files

query internal APIs

The goal isn’t a chatbot. The goal is this:

You say: “The API is failing. Find out why and fix it.”

And the agent goes through the same loop an SRE would:

observe → hypothesize → run commands → verify → fix.

But this raises a lot of uncomfortable questions.

Cursor/Claude can technically already run commands if you let them — so why is this a bad idea? Or is it?

I’m trying to understand the boundary between:

“This would be insanely useful for debugging and ops”

and

“This is how you take down production at 3am”

Before I go too far building this, I’d really like to hear from people who run real systems:

Would you ever try something like this?

Where would this be useful vs unacceptable?

What safeguards would you absolutely require?

What tasks would you want this for?

What makes this fundamentally different from just giving Cursor terminal access?

I’m early, testing this only on a local docker-compose setup with a few services. Just trying to sanity-check the idea with people who’ve been on call.

Comments

Bender•2w ago

Would you trust an AI coworker with shell access to your infrastructure?

I would not, most legal departments would not, all CSO's and compliance officers would not if someone explained it to them honestly. I have no doubt some will be tricked into approving such a thing and will try to back-peddle when it backfires on them.

Would you ever try something like this?

No I would not but I have only worked for companies with highly sensitive data, financial data, credit card data, proprietary code and data.

What safeguards would you absolutely require?

The entire AI stack would need to be written and maintained by the same company it is running in and all of the data must be stored in that companies data-centers. The interface must be behind multi-factor authentication and a corporate VPN running in the data-center. It would need to be audited by internal auditors, red team pen testers, external 3rd party code and infrastructure pen-testers and would have to go through the strictest change control. Every action by the AI must be highly audited real time and every action must be predictable and reproducible. No third party connections whatsoever. Any attempts to connect outbound must trigger and immediate mandatory all hands on deck response. The entire stack both client, agent and servers must run entirely within the data-center and not someones laptop regardless of how locked down their workstation or laptop is.

And that is even before factoring risks such as hallucinations, confidently accepting its own incorrect decisions. Blaming the AI for downtime, leaking customer data, leaking intellectual property would not be acceptable.

Having said all that, I am certain there will be some interested that could get it approved. Some companies give Okta root access via an agent to all their server fleets with no local guardrails. Should they ever get hacked that is insta-root on a lot of servers. My opinions on that matter are not suitable for public forums.

Nintendo Wii Themed Portfolio

"There must be something like the opposite of suicide "

Ask HN: Why doesn't Netflix add a “Theater Mode” that recreates the worst parts?

Show HN: Engineering Perception with Combinatorial Memetics

Show HN: Steam Daily – A Wordle-like daily puzzle game for Steam fans

The Anthropic Hive Mind

Just Started Using AmpCode

LLM as an Engineer vs. a Founder?

Crosstalk inside cells helps pathogens evade drugs, study finds

Show HN: Design system generator (mood to CSS in <1 second)

Show HN: 26/02/26 – 5 songs in a day

Toroidal Logit Bias – Reduce LLM hallucinations 40% with no fine-tuning

Top AI models fail at >96% of tasks

The Science of the Perfect Second (2023)

Bob Beck (OpenBSD) on why vi should stay vi (2006)

Show HN: a glimpse into the future of eye tracking for multi-agent use

The Optima-l Situation: A deep dive into the classic humanist sans-serif

Barn Owls Know When to Wait

Implementing TCP Echo Server in Rust [video]

LicGen – Offline License Generator (CLI and Web UI)

Service Degradation in West US Region

The Janitor on Mars

Bringing Polars to .NET

Adventures in Guix Packaging

Show HN: We had 20 Claude terminals open, so we built Orcha

Your Best Thinking Is Wasted on the Wrong Decisions

Warcraftcn/UI – UI component library inspired by classic Warcraft III aesthetics

Trump Vodka Becomes Available for Pre-Orders

Velocity of Money

Stop building automations. Start running your business