Hi! I've been making increasing use of OpenClaw agents in my life (on dedicated Mac Minis). I'm impressed by their power and flexibility, a lot of which comes from their capacity to self modify their memories, identity, and config. But that power comes with risks. Anyone who can talk to any of my agents has a vector for privilege escalation, for example by persuading them to update openclaw.json to add untrustworthy channels, to update AGENTS.md, etc. This makes me uncomfortable. Even when sandboxed, digital assistants have access to sensitive information and context.
Sure, you can put "Don't take candy from strangers" in the AGENTS.md, but we really need ways to set security boundaries that are enforced by something outside of the agent itself. SoulGuard sets such boundaries, starting with key files like SOUL.md and openclaw.json. It sets OS level filesystem protections to ensure that protected files are read-only, with a staging process to propose changes. Meaning that your agent can propose changes to openclaw.json, but it physically cannot edit the file unless you approve it.
Rights to approve changes is gated by the human user invoking sudo. (If your agent can sudo then it really has keys to the kingdom; don't do that). SoulGuard also has a daemon that can connect to Discord, so that you review and approve changes from within Discord, rather than needing to ssh in for sudo access. I've also added an openclaw plugin which is unnecessary for security guarantees, but helps the agents learn how to use SoulGuard. (This could use a bit more work, right now agents still may need some prompting to use `soulguard stage` in order to propose changes to protected files.)
I'm dogfooding SoulGuard on my own OpenClaw agents. I'd love to hear if others find it useful. Please do try to break the security model and see if you can find any flaws. I've tried to harden SoulGuard against totally compromised agents, but it's new software and I may have overlooked some attacks.
teamdandelion•1h ago
Sure, you can put "Don't take candy from strangers" in the AGENTS.md, but we really need ways to set security boundaries that are enforced by something outside of the agent itself. SoulGuard sets such boundaries, starting with key files like SOUL.md and openclaw.json. It sets OS level filesystem protections to ensure that protected files are read-only, with a staging process to propose changes. Meaning that your agent can propose changes to openclaw.json, but it physically cannot edit the file unless you approve it.
Rights to approve changes is gated by the human user invoking sudo. (If your agent can sudo then it really has keys to the kingdom; don't do that). SoulGuard also has a daemon that can connect to Discord, so that you review and approve changes from within Discord, rather than needing to ssh in for sudo access. I've also added an openclaw plugin which is unnecessary for security guarantees, but helps the agents learn how to use SoulGuard. (This could use a bit more work, right now agents still may need some prompting to use `soulguard stage` in order to propose changes to protected files.)
I'm dogfooding SoulGuard on my own OpenClaw agents. I'd love to hear if others find it useful. Please do try to break the security model and see if you can find any flaws. I've tried to harden SoulGuard against totally compromised agents, but it's new software and I may have overlooked some attacks.
Here's the GitHub (MIT licensed) https://github.com/mirascope/soulguard
And here's the project site :) https://soulguard.ai/