Ask HN: How are you handling non-probabilistic security for LLM agents?

2•amjadfatmi1•2w ago

I've been experimenting with autonomous agents that have shell and database access. The standard approach seems to be "put safety guardrails in the system prompt", but that feels like a house of cards honestly. If a model is stochastic, its adherence to security instructions is also stochastic.

I'm looking into building a hard "Action Authorization Boundary" (AAB) that sits outside the agent's context window entirely. The idea is to intecept the tool-call, normalize it into intent against a deterministic YAML policy before execution.

A few questions for those building in this space:

Canonicalization: How do you handle the messiness of LLM tool outputs? If the representation isn't perfectly canonical, the policy bypasses seem trivial.

Stateful Intent: How do you handle sequences that are individually safe but collectively risky? For example, an agent reading a sensitive DB (safe) and then making a POST request to an external API (dangerous exfiltration).

Latency: Does moving the "gate" outside the model-loop add too much overhead for real-time agentic workflows?

I’ve been working on a CAR (Canonical Action Representation) spec to solve this, but I’m curious if I'm overthinking it or if there’s an existing firewall for agents standard I'm missing.

Comments

yaront111•2w ago

i just built Cordum.io .. should give u 100% deterministic security open sourced and free :)

amjadfatmi1•2w ago

Hey @yaront111, Cordum looks like a solid piece of infrastructure especially the Safety Kernel and the NATS based dispatch.

My focus with Faramesh.dev is slightly upstream from the scheduler. I’m obsessed with the Canonicalization problem. Most schedulers take a JSON payload and check a policy, but LLMs often produce semantic tool calls that are messy or obfuscated.

I’m building CAR (Canonical Action Representation) to ensure that no matter how the LLM phrases the intent, the hash is identical. Are you guys handling the normalization of LLM outputs inside the Safety Kernel, or do you expect the agent to send perfectly formatted JSON every time?

yaront111•2w ago

That’s a sharp observation. You’re partially right CAP (our protocol) handles the structural canonicalization. We use strict Protobuf/Schematic definitions, so if an agent sends a messy JSON that doesn't fit the schema, it’s rejected at the gateway. We don't deal with 'raw text' tool calls in the backend. But you are touching on the semantic aliasing problem (e.g. rm -rf vs rm -r -f), which is a layer deeper. Right now, we rely on the specific Worker to normalize those arguments before they hit the policy check, but having a universal 'Canonical Action Representation' upstream would be cleaner. If you can turn 'messy intent' into a 'deterministic hash' before it hits the Cordum Scheduler, that would be a killer combo. Do you have a repo/docs for CAR yet?

amjadfatmi1•2w ago

Spot on, Yaron. Schematic validation (Protobuf) catches structural errors, but semantic aliasing (the 'rm -rf' vs 'rm -r -f' problem) is exactly why I developed the CAR (Canonical Action Representation) spec.

I actually published a 40-page paper (DOI: 10.5281/zenodo.18296731) that defines this exact 'Action Authorization Boundary.' It treats the LLM as an untrusted actor and enforces determinism at the execution gate.

Faramesh Core is the reference implementation of that paper. I’d love for you to check out the 'Execution Gate Flow' section. it would be a massive win to see a Faramesh-Cordum bridge that brings this level of semantic security to your orchestrator.

Code: https://github.com/faramesh/faramesh-core

kxbnb•1w ago

Your framing of the problem resonates - treating the LLM as untrusted is the right starting point. The CAR spec sounds similar to what we're building at keypost.ai.

On canonicalization: we found that intercepting at the tool/API boundary (rather than parsing free-form output) sidesteps most aliasing issues. The MCP protocol helps here - structured tool calls are easier to normalize than arbitrary text.

On stateful intent: this is harder. We're experimenting with session-scoped budgets (max N reads before requiring elevated approval) rather than trying to detect "bad sequences" semantically. Explicit resource limits beat heuristics.

On latency: sub-10ms is achievable for policy checks if you keep rules declarative and avoid LLM-in-the-loop validation. YAML policies with pattern matching scale well.

Curious about your CAR spec - are you treating it as a normalization layer before policy evaluation, or as the policy language itself?

niyikiza•1w ago

Working on this problem: https://github.com/tenuo-ai/tenuo

Different angle than policy-as-YAML. We use cryptographic capability tokens (warrants) that travel with the request. The human signs a scoped, time-bound authorization. The tool validates the warrant at execution, not a central policy engine.

On your questions:

Canonicalization: The warrant specifies allowed capabilities and constraints (e.g., path: /data/reports/*). The tool checks if the action fits the constraint. No need to normalize LLM output into a canonical representation.

Stateful intent: Warrants attenuate. Authority only shrinks through delegation. You can't escalate from "read DB" to "POST external" unless the original warrant allowed both. A sub-agent can only receive a subset of what its parent had, cryptographically enforced.

Latency: Stateless verification, ~27μs. No control plane calls. The warrant is self-contained: scope, constraints, expiry, holder binding, signature chain. Verification is local.

The deeper issue with policy engines: they check rules against actions, but they can't verify derivation. When Agent B acts, did its authority actually come from Agent A? Was it attenuated correctly?

Wrote about why capabilities are the only model that survives dynamic delegation: https://niyikiza.com/posts/capability-delegation/

LineageOS 23.2

Crypto Deposit Frauds

Substack makes money from hosting Nazi newsletters

Framing an LLM as a safety researcher changes its language, not its judgement

Are there anyone interested about a creator economy startup

Show HN: Skill Lab – CLI tool for testing and quality scoring agent skills

2003: What is Google's Ultimate Goal? [video]

Roger Ebert Reviews "The Shawshank Redemption"

Busy Months in KDE Linux

Zram as Swap

Green’s Dictionary of Slang - Five hundred years of the vulgar tongue

Nvidia CEO Says AI Capital Spending Is Appropriate, Sustainable

Show HN: StyloShare – privacy-first anonymous file sharing with zero sign-up

Part 1 the Persistent Vault Issue: Your Encryption Strategy Has a Shelf Life

Show HN: Teleop_xr – Modular WebXR solution for bimanual robot teleoperation

The Highest Exam: How the Gaokao Shapes China

Open-source framework for tracking prediction accuracy

India's Sarvan AI LLM launches Indic-language focused models

Show HN: CryptoClaw – open-source AI agent with built-in wallet and DeFi skills

ShowHN: Make OpenClaw respond in Scarlett Johansson’s AI Voice from the Film Her

CReact Version 0.3.0 Released

Show HN: CReact – AI Powered AWS Website Generator

The rocky 1960s origins of online dating (2025)

Show HN: Agent-fetch – Sandboxed HTTP client with SSRF protection for AI agents

Why there is no official statement from Substack about the data leak

Effects of Zepbound on Stool Quality

Show HN: Seedance 2.0 – The Most Powerful AI Video Generator

Ask HN: Do we need "metadata in source code" syntax that LLMs will never delete?

Pentagon cutting ties w/ "woke" Harvard, ending military training & fellowships

Can Quantum-Mechanical Description of Physical Reality Be Considered Complete? [pdf]