frontpage.

Show HN: Castor – a secure execution layer for LLM agents

1•claytonia•1h ago

Hi HN, I'm one of the authors of Castor.

Today's agent frameworks have done serious work on the cognitive layer: tool selection, planning, multi-agent coordination. What they don't provide is an execution layer, the machinery that controls how tool calls run, not just which ones get made.

Two gaps kept biting us:

There's no way to bound what an agent can do. It can call any tool, execute any number of operations, with nothing structurally preventing it. Give it access to delete_file and it can wipe your filesystem before you notice.

There's no process model. And when it does go off the rails, you can't even stop it. No pause, no resume. If something fails on step 39 of 40, you restart from step 1.

Castor routes every tool call through a kernel as a syscall. The agent has no other execution path, so capability limits and approval gates are structural, not advisory.

Within budget, everything auto-executes, even deletes. No popups. Budget runs out, the kernel stops the agent and a human decides. Budget replaces per-call approval.

Every syscall result is logged in an immutable journal. Suspend = unwind the stack. Resume = replay from the top with cached responses, live execution only from the suspension point. So you don't burn another $2.00 on tokens just to see if your fix worked. Capability limits, HITL, crash recovery, and deterministic debugging all fall out of the same mechanism.

The tradeoff is real: all non-determinism has to go through the kernel. If the agent sneaks in a raw API call outside the boundary, the replay diverges. It's a hard constraint.

When we stepped back, we realized we'd reinvented a 50-year-old idea. This is exactly the separation an OS draws between user space and kernel space. Castor is, in that sense, a microkernel for agents: a minimal privileged core that enforces resource limits and mediates every interaction between agent code and the outside world.

One thing we're still not sure about: is routing ALL non-determinism through a kernel boundary too heavy-handed? We considered using a lighter model where only destructive tools go through the check, but then you lose deterministic replay. Anyone found a middle ground or other ideas?

Code: https://github.com/substratum-labs/castor Docs: http://substratumlabs.ai/castor-docs/

Intel Binary Optimization Tool: Enhanced Performance for Gaming

Antimatter has been transported for the first time ever – in the back of a truck

No Signs of AI Replacing Offshore Workers

Official CLI for Resend

Building a Blog with Elixir and Phoenix

Security vendor slams supplier for delayed notice after staff data exposed

Netboot.xyz

Software for Myself

Anthropic's CEO Said All Code Will Be AI-Generated in a Year (March 2025)

Nomos – an execution firewall for AI agents

NASA Unveils Initiatives to Achieve America's National Space Policy

Günter Schabowski

We Don't Have a Lyme Disease Vaccine

Paper: Hallucination Detector That Works

Update on the OpenAI Foundation

AI Boom Drives US to Build Enough Battery Storage Systems for Domestic Demand

Why There Is No "AlphaFold for Materials" – AI for Materials Discovery

Cognitive Science of Religion

A $1k AWS bill led me to redesign my ECS architecture

Alibaba revealed the XuanTie C950, a 5-nanometer RISC-V Chip for agentic AI

ToolClad: Declarative tool interface contracts for agentic runtimes

Red Lobster's Last Gasp

Show HN: Gridland: make terminal apps that also run in the browser

Show HN: Ensemble Neuroscience – Full Brain Mapping for Precision Treatment

Show HN: Aurea, a lossy image codec I built from scratch that beats JPEG (Rust)

Launching AccessPatch on Product Hunt today – would love your support

The Last Contract: William T. Vollmann's Battle to Publish an Epic (2025)

As parents age, their children face hard choices about when to take the car keys

A Decade of Eventide: Evolving an Event-Sourced Architecture and Ecosystem

Playable CSS-Only Super Mario Bros Game