A misconfigured prompt or hallucination can cause an agent to navigate to a phishing domain, expose an API key, or confidently claim a task succeeded when it actually clicked a disabled button.
The standard fix right now is "LLM-as-a-judge"—taking a screenshot after the fact and asking GPT-4, "Did this work and is it safe?" That introduces massive latency, burns tokens, and is fundamentally probabilistic.
We built predicate-secure to fix this.
It’s a drop-in Python wrapper that adds a deterministic physics engine to your agent's execution loop.
In 3 to 5 lines of code, without rewriting your agent, it enforces a complete three-phase loop:
Pre-execution authorization:
Before the agent's action hits the OS or browser, it is intercepted and evaluated against a local, fail-closed YAML policy. (e.g., Allow browser.click on button#checkout, Deny fs.read on ~/.ssh/*).
Action execution:
The agent executes the raw Playwright/framework action.
Post-execution verification:
It mathematically diffs the "Before" and "After" states (DOM or system) to prove the action succeeded.
To avoid the "LLM-as-a-judge" trap, the execution of the verification is purely mathematical. We use a local, offline LLM (Qwen 2.5 7B Instruct) strictly to generate the verification predicates based on the state changes (e.g., asserting url_contains('example.com') or element_exists('#success')), and then the runtime evaluates those predicates deterministically in milliseconds.
The DX looks like this:
from predicate_secure import SecureAgent from browser_use import Agent
1. Your existing unverified agent
agent = Agent(task="Buy headphones on Amazon", llm=my_model)
2. Drop-in the Predicate wrapper
secure_agent = SecureAgent( agent=agent, policy="policies/shopping.yaml", mode="strict" )
3. Runs with full Pre- & Post-Execution Verification
secure_agent.run()
We have out-of-the-box adapters for browser-use, LangChain, PydanticAI, OpenClaw, and raw Playwright.
Because we know developers hate giving external SaaS tools access to their agent's context, the entire demo and verification loop runs 100% offline on your local machine (tested on Apple Silicon MPS and CUDA).
For enterprise/production fleets, the pre-execution gate can optionally be offloaded to our open-source Rust sidecar (predicate-authorityd) for <1ms policy evaluations.
The repo is open-source (MIT/Apache 2.0). We put together a complete, offline demo showing the wrapper blocking unauthorized navigation and verifying clicks locally using the Qwen 7B model.
Repo and Demo: https://github.com/PredicateSystems/predicate-secure
Another demo for securing your OpenClaw:
https://github.com/PredicateSystems/predicate-claw
Demo (GIF):
https://github.com/PredicateSystems/predicate-claw/blob/main...
I'd love to hear what the community thinks about deterministic verification vs. probabilistic LLM judges, or answer any questions about the architecture!
selfradiance•1h ago