I’ve been using OpenClaw daily since it dropped in November. I love the agency it provides, but as I started giving it more production API keys and access to my local filesystem, I realized the threat model was essentially "hope-based."
We ran an experiment to see how resilient a standard OpenClaw setup was to prompt injection. Within 2 mins, we were able to exfiltrate active session tokens and API credentials through the chat interface.
The problem is fundamental: in most agent architectures, the LLM logic and the sensitive credentials live in the same process space. If the agent is tricked, the attacker has everything.
We built and open-sourced a project called ClawShell to move the security boundary from the "prompt" to the "system runtime."
How it works: ClawShell acts as a privileged protection layer. It isolates sensitive operations into a separate process enforced by the OS. The secrets never enter the agent’s memory or process space. When the agent needs to perform an action, it sends a request to the ClawShell wrapper, which validates the intent and executes the call using the protected keys.
If the agent is hijacked via prompt injection, the attacker gets a scoped identifier that contains zero credentials and no lateral access to the sensitive environment.
Key Technical Details: * Structural Boundaries: We assume the LLM is untrusted. Isolation is handled at the OS level, not via "system prompts." * Zero-Trust Tooling: The agent triggers the tool, but the tool execution is handled by a separate, restricted process. * Compatibility: It’s designed to be a drop-in wrapper for existing OpenClaw instances.
We’re launching v0.1 today. I’m curious to hear how others are thinking about the "Lethal Trifecta" (Data + Action + Communication) in the agent space. Is anyone else looking at Sanboxing for this, or is OS-level isolation the right path?
theopsguy•1h ago