I've written up our approach to securing AI tool calling through cryptographic enforcement rather than prompt engineering.
The core problem: we're mixing control and data planes without security boundaries. When an LLM with tool access processes untrusted input, you get intent hijacking, tool chaining, and context poisoning. The LLM fundamentally cannot be the boundary - it's trained to follow instructions, not evaluate them.
The insight: treat every AI entity like an untrusted network service. We give LLMs, tools, and agents cryptographic identities and enforce policies at tool boundaries (where the actual damage happens). This creates an "Authenticated Workflows" pattern - like mTLS but for AI interactions.
Intent is signed before the LLM sees it. Tools verify signatures independently. Policies are cryptographically bound to invocations. Even if the LLM is completely confused by prompt injection, it can't forge these signatures.
We've validated this with SecureOpenAI and SecureMCP implementations that block injections that would otherwise succeed. The challenge was making it transparent to developers - they just call tool(params) while security happens underneath.
mrajagopalan•1h ago
I've written up our approach to securing AI tool calling through cryptographic enforcement rather than prompt engineering.
The core problem: we're mixing control and data planes without security boundaries. When an LLM with tool access processes untrusted input, you get intent hijacking, tool chaining, and context poisoning. The LLM fundamentally cannot be the boundary - it's trained to follow instructions, not evaluate them.
The insight: treat every AI entity like an untrusted network service. We give LLMs, tools, and agents cryptographic identities and enforce policies at tool boundaries (where the actual damage happens). This creates an "Authenticated Workflows" pattern - like mTLS but for AI interactions.
Intent is signed before the LLM sees it. Tools verify signatures independently. Policies are cryptographically bound to invocations. Even if the LLM is completely confused by prompt injection, it can't forge these signatures.
We've validated this with SecureOpenAI and SecureMCP implementations that block injections that would otherwise succeed. The challenge was making it transparent to developers - they just call tool(params) while security happens underneath.
Blog post with technical details: https://www.macawsecurity.com/blog/zero-trust-tool-calling-f...
Would love feedback from anyone building production AI systems. Are others seeing these attack vectors in the wild? How are you approaching defense?