Show HN: Protect Against Prompt Injection in OpenClaw

https://www.npmjs.com/package/@mightyai/citadel-guard-openclaw

4•Munam•1h ago

Hi HN,

OpenClaw agents are incredibly useful. They're also incredibly vulnerable.

Your agent fetches a webpage. Buried in an HTML comment:

.

Your agent reads it, processes it, acts on it. No alert. No log.

This is indirect prompt injection. It's the #1 attack vector against AI agents right now.

We built Citadel Guard, an OpenClaw plugin that scans every message, tool call, and response before anything happens. It uses a BERT model running locally on your machine. Not an API. Not our servers. Sub-50ms decisions.

Repo: https://github.com/TryMightyAI/citadel-guard-openclaw

NPM: https://www.npmjs.com/package/@mightyai/citadel-guard-opencl...

npm install @mightyai/citadel-guard-openclaw

What it does:

Uses all five OpenClaw lifecycle hooks:

Incoming messages – scanned

Tool arguments – scanned

Tool results – scanned for payloads

Outbound responses – scanned for credential leaks

Initial context – scanned

Real example:

You ask: "What environment variables do I have set?"

Without Citadel Guard, your agent responds with your AWS keys and GitHub tokens in plaintext. Now they're in chat history, logs, maybe visible to teammates.

With Citadel Guard, that response gets blocked before it leaves. Your secrets stay secret.

Testing:

345 adversarial test cases. Zero false positives in our benchmark. Catches prompt injections (including DAN), credential leaks, tool argument poisoning. Normal messages pass clean.

The catch:

Citadel OSS scans text only. If your agent processes images, PDFs, or documents, attackers can embed injections there. Text scanners can't see them.

That's what our paid API handles ($25/mo): same detection extended to images, documents, and text in one call. Same speed. Plugin auto-routes multimodal content when you add an API key.

Why this matters:

OpenClaw's own docs say "there is no 'perfectly secure' setup." We think security should be invisible, like TLS. You shouldn't have to think about it.

Both the text guard and the plugin are open source (MIT). Would love feedback from folks running agents in production, especially false positive reports or new attack patterns we missed.

Comments

jodoking•1h ago

super excited to share this with the community. and looking forward to your feedback. i am part of the team behind this tool.

Munam•1h ago

Was great to work on this and meet all the builders using the tool at large. Just want to keep people safe!

How I built Fluxer, a Discord-like chat app

Are ads the only way to scale AI to mainstream users?

The LLM Context Tax: Best Tips for Tax Avoidance

Linux 7.0 Brings an EFI Framebuffer Quirk for Valve's Steam Deck

Supercomputer simulations test turbulence theories at 35T grid points

Add voice support for terminal coding assistants on Apple Silicon

Geoff's Projects – ASCII Video Terminal

Ask HN: Freelance Dev Available – Discord Bots, Web Scraping, GitHub Automation

Majutsu, Magit for Jujutsu

Evidence for the earliest hominin use of wooden handheld tools found in Greece

Writing a Lisp JIT Interpreter with GraalVM Truffle

macOS Tahoe 26.3

iOS 26.3

Chrome 146 Now in Beta with WebNN Origin Trial for Neural Networks in Browser

Preparing Your Website for LLMs

The $6 Bug

Show HN: Open-source monitoring for AI agents (MCP-compatible)

ChatGPT: The "Are You Sure?" Problem

How Did the FBI Get Nancy Guthrie's Nest Doorbell Footage?

Reverse cicd with GitHub and self hosted Forgejo

Hackable Software

Ask HN: If agentic AI is the future, why is every startup shipping a dashboard?

Winter Olympic athletes are rightfully taking Covid-19 precautions

React Native 0.84

Harness Engineering

Amazon Ring's lost dog ad sparks backlash amid fears of mass surveillance

Claw Compactor – Cut AI agent token spend in half with 5 compression layers

Choroid plexus alterations in long Covid and their associations with Alzheimer's

Sieve is simpler than LRU

AI agent sandboxing: how to choose between primitives, runtimes, and platforms