Show HN: ClawGuard – Detect 42 prompt injection patterns in <10ms

https://github.com/joergmichno/clawguard

2•joergmichno•1d ago

Comments

joergmichno•1d ago

A bit more context on why pattern matching instead of ML:

1. Speed: <10ms vs 200-500ms for LLM-based checks means you can scan every user message without adding latency.

2. Cost: No API calls to OpenAI/Anthropic for detection = predictable costs at scale.

3. Transparency: When a pattern matches, you know exactly which of the 42 patterns triggered and why. No "the model thinks this looks suspicious."

The tradeoff is obvious — patterns can't catch truly novel attacks. But neither can LLMs reliably (they get tricked by the same prompt injections they're supposed to detect).

My goal: catch the 80% of attacks that are copy-pasted from public prompt injection databases, so you can focus your resources on the remaining 20%.

For CI/CD users: the GitHub Action runs ClawGuard on every PR, so you catch injections before they reach production. The Python SDK lets you integrate scanning into your agent pipeline with two lines of code.

Would love to hear from folks running AI agents in production — what's your current detection strategy?

Someone•1d ago

I think this is an example where obscurity is required to get (some) security. Making this and its test cases public makes training a model to circumvent it too easy.

joergmichno•1d ago

Fair point — and one I thought about carefully before open-sourcing.

A few reasons why I think open patterns are actually the right call:

1. The patterns are already public. Most prompt injection techniques are documented on GitHub, in research papers, and on sites like jailbreakchat. Attackers don't need my regex list — they already have the playbook.

2. Security through obscurity doesn't work for defense. History (from antivirus to WAFs to OWASP) shows that open detection rules get more eyes, more contributions, and faster updates than closed ones. Snort, ModSecurity, YARA — all open, all industry standard.

3. The real threat isn't regex bypass. If an attacker is sophisticated enough to craft novel prompts that evade pattern matching, they'll also evade most LLM-based detectors. The answer for that 20% is layered defense (output filtering, sandboxing, least-privilege), not secret patterns.

4. Open source = trust. Enterprise customers want to audit what's running in their pipeline. "Trust us, it's secret" is a harder sell than "here are the exact 42 patterns, verify them yourself."

That said — the paid Shield API does include additional detection layers beyond the open-source patterns, specifically for this reason.

IronDiff – Network Config Backup and Analysis

Ruby Users Forum February–March Update

Amazon Wins Court Order Blocking Perplexity AI Shopping Bots

Fooling AI Agents: Web-Based Indirect Prompt Injection Observed in the Wild

Precision Learning Has the Potential to Do What Personalized Learning Could Not

Towards "Let's Encrypt" for Document Signing

Amazon asks senior engineers to address issues created by 'AI assisted changes'

Tesla FSD drives through railroad crossing barriers in viral video

Why on-device agentic AI can't keep up

Rust Coreutils 0.7 Released with Many Performance Optimizations

$3 ChromeOS Flex stick will revive old and outdated computers

Ultra-compact photonic AI chip operates at the speed of light

Tiny transmitter could help scientists understand surprisingly social wasps

Hiroo Onoda: The Japanese Soldier Who Continued Fighting World War II Until 1974

Experiments.md to stay sane down the rabbit hole

Trump Admin Cyber Strategy Centers Private Sector in Offensive Cyber Operations

How did the Apollo flight computers get men to the moon and back? (2018) [video]

YouTube ads are about to get even longer and they'll be unskippable

Gemini Embedding 2: Our first natively multimodal embedding model

Ask HN: What will be the future of RPE in IT services

Show HN: What was the world listening to? Music charts, 20 countries (1940–2025)

An Update on SVG in GTK

Ad-tech is fascist tech

I built and used this boilerplate to generate $2.5M in revenue over 5 years

Show HN: React Tourlight

Cold Outreach

Sweat of Tourists Has Covered Michelangelo's Sistine Chapel Fresco in White Film

Live Nation and US Justice Department Edge Towards Settling Antitrust Lawsuit

Ask HN: What are you using OpenClaw for?

How Well Does Agent Development Reflect Real-World Work?