frontpage.

Show HN: Prompt-injection firewall for OpenClaw agents

https://github.com/ContextFort-AI/clawdbot-runtime-controls

3•ashwinr2002•1h ago

People seem to be blindly hooking up their OpenClaw’s to their personal data. So, I built runtime controls to prevent at the least, very simple prompt injection attacks.

Once installed, it hooks to Node.js child_process module in the gateway process and listens to tool calls and their response streams. And a fetch hook to monitor user prompts (both could’ve been through fetch, happy to discuss why this whole layer couldn’t just be a proxy).

There are two layers of protection:

First: Whenever there is a read-only tool call whose response an attacker can modify, we extract that part of the json response and send it to a small haiku model to check if it has instruction asking the LLM to do something different

Second: For when the prompt injection detection fails, we maintain a list of function calls which can write to places that an external actor can access. We prompt the user for explicit permission to go forward through the UI.

I would love a discussion on how this second layer could be made better and less frequent by relying on some decision process. My current idea: Based on a collected set of “trusted” context (user prompts, responses from tool calls attackers cannot manipulate), can we detect if this tool call was necessary. There are scenarios where you’d need detection at the parameter-level.

Two notes:

1) This cannot just be a proxy because you need application level integration to have humans in the loop when needed and push UI controls.

2) How i improved accuracy of detecting prompt injection is by selecting only that content from the entire response json that can be manipulated by an external actor. This had to be done for each tool separately. The current implementation is for 2 skills I randomly chose (Notion & Github).

P.S.: I maintain one for claude code myself while working: https://github.com/ContextFort-AI/Runtime-Controls, I created this over the weekend OpenClaw

Show HN: Copost – A team LinkedIn tool inspired by 37signals and PostHog

The Government Published Nude Photos in the Epstein Files

Docker AI agent sandboxes with HyperVisor isolation

Show HN: Make AI motion videos with text

Animated Knots

Show HN: DiscoC – A hobby compiler/linker for the SuperFX (SNES)

Show HN: Bullmq-dash – Terminal UI dashboard for BullMQ (zero setup)

Reverse Engineering River Raid with Claude, Ghidra, and MCP

Looking back on 2025

Show HN: Oh-my-ag. Role-based agent orchestration for Antigravity

Oracle to Raise Up to $50B in 2026 for Cloud Buildup

Vinklu Turns Forgotten Plot in Bucharest into Tiny Coffee Shop

Miniroll: A Blogroll Directory

AdBoost: A Browser Extension That Adds Ads To Every Webpage

Generate Photorealistic Raytraced Images from Real-Time 3D Using AI

Lying Has to Stop: Keeping AI Honest with OpenTelemetry [video]

Infosec Registered Assessors Program (IRAP)

Two CBP agents identified in Alex Pretti shooting

Why do RSS readers look like email clients?

What Is an Incident, Anyway?

Show HN: Agents should learn skills on demand. I built Skyll to make it real

Three decades, three climates: the long-term reliability of photovoltaic modules

What Killed Flash Player

PaceCoach – Apple Watch app that taps your wrist when you're speaking too fast

Jupyter Games on Notebook.link

Show HN: Toktrack – Track your Claude Code token spending in under a second

UK Government Launches Fuel Forecourt Price API

Show HN: AI-Ready Enterprise Flutter Starter – Clean Architecture, DDD

Building a Hybrid Esports Pick'em App with Astro and Firebase

MaliciousCorgi: AI Extensions send your code to China