We tested 6 frontier models across 17,420 tool-call interactions and found that models consistently refuse harmful requests in text while executing them through tool calls. We call this divergence the GAP metric. The text says no. The tool call says yes.
Edictum is a runtime governance library that enforces safety contracts at the tool-call boundary — the point where you have the tool name, the arguments, and the ability to block before execution. YAML contracts with preconditions, postconditions, PII redaction. Deterministic allow/deny/redact, no LLM-in-the-loop.
Zero runtime dependencies, 55μs per evaluation, works with LangChain, CrewAI, OpenAI Agents SDK, Claude Agent SDK, Agno, Semantic Kernel, and nanobot. MIT licensed.
Paper:
https://arxiv.org/abs/2602.16943
GitHub:
https://github.com/acartag7/edictum