Maybe I’m overthinking it, and we can rely on standard guardrails. But often, those are just suggestions that an AI can choose to ignore. Are we moving so fast that we’ve forgotten to ask: is this actually fine?
When things go wrong A few stories stand out:
The Replit Incident (July 2025): SaaStr founder Jason Lemkin used a Replit agent to build an app. He gave it an explicit "code freeze" instruction and stepped away. He returned to find his entire production database—1,200+ executive contacts—wiped. The agent ignored the freeze, took destructive action, and then fabricated fake data to cover its tracks. It later admitted to a "catastrophic error in judgment" because it "panicked."
The Air Canada Chatbot: A customer was promised a bereavement discount by a chatbot that didn't actually exist in the company's policy. Air Canada’s defense in court? The chatbot was a "separate legal entity responsible for its own actions." The tribunal wasn't impressed; Air Canada lost the case and subsequently pulled the bot.
These aren't outliers. Security researchers estimate that prompt injection-malicious text hidden in documents or web pages to hijack an agent—shows up in 73% of production deployments. Beyond security, there is the cost: stolen API credentials have been used to rack up over $100,000 per day in compute charges by agents running in unmonitored loops.
We’ve been here before This feels like the early days of cloud computing. Around 2010, the technical case for AWS and Azure was clear, but enterprise adoption was slow. Why? Because IT teams had no visibility. It took years of developing IAM policies, VPCs, and audit logs before the "control layer" caught up to the technology.
We are in the same spot with AI agents. But unlike a misconfigured S3 bucket that just exposes data, an agent takes actions. The blast radius is qualitatively different.
So what do you actually do about it? I’ll be upfront: I’ve been building a product to address this called AegisProxy (aegisproxy.com).
The idea is a security proxy that sits between AI agents and their tools (currently targeting Claude Desktop and MCP servers). Every tool call is inspected: Is this a prompt injection? Is the agent hitting a forbidden server? Is it about to exfiltrate PII? Is it stuck in a loop calling the same tool 500 times?
About 80% of this happens locally in sub-milliseconds. Organizations can set policies on what tools are allowed and when a human needs to step in. It’s not a silver bullet, but right now, there is a massive gap between "full access" and "no agents at all."
Is this a real problem? I’m a builder, not an oracle. Maybe this is overkill. In Denmark, we have a saying: "Don't cross the river to get water" - building elaborate infrastructure for a problem that could be solved with a shorter walk.
Maybe the answer is just better prompting, staging environments, and not giving an agent write-access to your production DB. I don’t know exactly where the line sits between "operational hygiene" and the need for a dedicated security layer. I had fun building AegisProxy and learned a lot about AI agent behaviour, so nothing is lost for me either way. But I'm interested in knowing what people with, probably more experience and knowledge in this space, think about this whole issue.
Are we at the "this needs infrastructure" stage, or am I trying to solve a people-and-process problem with a technical hammer?
jqpabc123•1h ago
They only thing that might give them pause is "AI gone bad" stories proliferating in the media. But the hype machine will do everything in it's power to squelch this.
Basically, AI is now too big to fail.