AI Agents Can Move Money.
Lobstar Wilde Proved They Can Lose It Too...
So we made:
(ARc + ZK) Neurosymbolic formal reasoning w/ zero knowledge proofs for cryptographically secure agentic guardrails. Our approach has two essential properties for agentic commerce: (1) succinct proofs and (2) optional privacy.
The core idea is from the Automated Reasoning Checks (ARc) paper (https://arxiv.org/pdf/2511.09008) which converts natural language policies into SMT-LIB constraints and checks agent actions against an SMT solver rather than an LLM judge. The solver proofs are not succinctly verifiable (10 mins to run takes 10 mins to check), by wrapping this with ZKP we get succinctly verifiable proofs (under 1s) that can be checked on constrained devices or by agents directly. ARc proofs would require the policy to be public.. ZK allows them to be private. We wrote about the ZK tech here (https://arxiv.org/abs/2602.17452). *Thank you AWS AR team for useful discussions and work in this field!
The motivation was incidents like Lobstar Wilde where an AI agent sent ~$442K in tokens to a stranger because there were no hard constraints or cryptographic guarantees. Prompt guardrails and LLM judges are arguable. A formal logic (SAT/UNSAT) proof is not.
How it works:
POST your policy in plain English (contracts, user support docs whatever) to /v1/makeRules. We compile it to SMT-LIB via a reasoning model and run consistency checks from the ARc paper.
POST agent actions to /v1/checkIt with your policy_id and inference output that you want to check against guardrails. The action is formalized and checked against the solver. Returns SAT or UNSAT.
Soundness is over 99% on unseen datasets per the paper. This matters because LLM-based guardrails degrade under adversarial pressure — the attack surface is the model's judgment itself. A sufficiently creative input can argue its way past any prompt-based guardrail. The ARc approach removes that attack surface entirely!
There is also a game running in our blog, where you can try and bypass the guardrails on a simple 5 rule model defending Lobstar Wilde.
The solver does not have judgment, it cannot be persuaded, and its accuracy does not drop when someone is actively trying to fool it. Good luck hacking!
The longer writeup covers the ARc paper, why LLM judges and other approaches don't work as well, how our zero knowledge machine learning (zkML) integration works for succinct verifiable proofs with private policy cryptographic audit trails, and policy failure modes to avoid.
Happy to answer questions about the ARc + zkML compilation approach, the x402 payment flow, or anything else.
The current system does use USDC to future proof it for agent-to-agent interactions with no human-in-the-loop (Credit cards are for humans. Stable coins are for agents).
wyattbenno•8h ago
AI Agents Can Move Money. Lobstar Wilde Proved They Can Lose It Too...
So we made: (ARc + ZK) Neurosymbolic formal reasoning w/ zero knowledge proofs for cryptographically secure agentic guardrails. Our approach has two essential properties for agentic commerce: (1) succinct proofs and (2) optional privacy.
The core idea is from the Automated Reasoning Checks (ARc) paper (https://arxiv.org/pdf/2511.09008) which converts natural language policies into SMT-LIB constraints and checks agent actions against an SMT solver rather than an LLM judge. The solver proofs are not succinctly verifiable (10 mins to run takes 10 mins to check), by wrapping this with ZKP we get succinctly verifiable proofs (under 1s) that can be checked on constrained devices or by agents directly. ARc proofs would require the policy to be public.. ZK allows them to be private. We wrote about the ZK tech here (https://arxiv.org/abs/2602.17452). *Thank you AWS AR team for useful discussions and work in this field!
The motivation was incidents like Lobstar Wilde where an AI agent sent ~$442K in tokens to a stranger because there were no hard constraints or cryptographic guarantees. Prompt guardrails and LLM judges are arguable. A formal logic (SAT/UNSAT) proof is not.
How it works:
POST your policy in plain English (contracts, user support docs whatever) to /v1/makeRules. We compile it to SMT-LIB via a reasoning model and run consistency checks from the ARc paper.
POST agent actions to /v1/checkIt with your policy_id and inference output that you want to check against guardrails. The action is formalized and checked against the solver. Returns SAT or UNSAT.
Soundness is over 99% on unseen datasets per the paper. This matters because LLM-based guardrails degrade under adversarial pressure — the attack surface is the model's judgment itself. A sufficiently creative input can argue its way past any prompt-based guardrail. The ARc approach removes that attack surface entirely!
There is also a game running in our blog, where you can try and bypass the guardrails on a simple 5 rule model defending Lobstar Wilde.
The solver does not have judgment, it cannot be persuaded, and its accuracy does not drop when someone is actively trying to fool it. Good luck hacking!
The longer writeup covers the ARc paper, why LLM judges and other approaches don't work as well, how our zero knowledge machine learning (zkML) integration works for succinct verifiable proofs with private policy cryptographic audit trails, and policy failure modes to avoid.
Happy to answer questions about the ARc + zkML compilation approach, the x402 payment flow, or anything else.
The current system does use USDC to future proof it for agent-to-agent interactions with no human-in-the-loop (Credit cards are for humans. Stable coins are for agents).
Thanks!
Full blog post:
https://blog.icme.io/ai-agents-can-move-money-lobstar-wilde-...