Not "verify then trust." Not "trust but monitor." Just - don't trust them. Assume every user is compromised, negligent, or adversarial. Build your systems accordingly. This principle gave us least privilege, network segmentation, rate limiting, audit logs, DLP. It works.
So why are we treating AI agents like trusted colleagues?
The current alignment discourse assumes we need to make agents want to behave. Instill values. Train away deception. This is the equivalent of solving security by making users trustworthy. We tried that. It doesn't work. You can't patch human nature, and you can't RLHF your way to guaranteed safety.
Here's the thing: every principle from zero-trust security maps directly to agent orchestration.
Least privilege. An agent that writes unit tests doesn't need prod database access. Scope its capabilities via RBAC - same as you'd scope a service account.
Isolation. Each agent runs in its own pod. It can't read another agent's memory, touch its files, or escalate sideways. Same reason you don't run microservices as root in a shared namespace.
Budget enforcement. Token caps and cost limits per agent, per task. An agent that tries to burn $10k on a $5 task gets killed. Like API rate limits, but for cognition.
Audit trails. Full OpenTelemetry tracing on every action, every delegation, every result. You don't need to trust an agent if you can observe everything it does.
PII redaction. Presidio scans agent output before it leaves the pod. Same principle as DLP in enterprise - don't let sensitive data leak, regardless of intent.
Policy enforcement. Declarative policies (CRDs) constrain what agents can and can't do. Like network policies, but for agent behavior.
We built this. It's called Hortator - a Kubernetes operator for orchestrating autonomous AI agent hierarchies. Agents (tribune → centurion → legionary) run in isolated pods with RBAC, budget caps, PII redaction, and full OTel tracing. Everything is a CRD: AgentTask, AgentRole, AgentPolicy. Written in Go, MIT licensed.
We didn't solve alignment. We made it irrelevant by treating agents as untrusted workloads - exactly how we've treated every other piece of software for the last 20 years.
GitHub: https://github.com/hortator-ai/Hortator/
Genuinely curious what this community thinks. Are we wrong to frame alignment as an infrastructure problem? What's the zero-trust model missing when applied to agents? Poke holes - that's what we need.