frontpage.

Hi HN,

We’ve been working on Patronus Protect, an on-device security layer for AI systems that aims to detect prompt injections and prevent sensitive data from leaving the device.

As part of that work we trained a prompt-injection detection model and decided to release a smaller version of it publicly.

Wolf Defender is a lightweight BERT-style model trained on roughly 5% of our full internal dataset. Despite the reduced training set it already performs competitively with several existing open-source prompt-injection detectors.

One issue we observed with many detectors is that they overfit to obvious trigger phrases like “Ignore previous instructions”. Many real attacks avoid these patterns through obfuscation.

To address this, the training data includes heavy augmentation designed to cover different prompt-injection styles, including:

- unicode and homoglyph perturbations - encoded payloads (e.g. base64) - HTML and code comment injections - structural wrappers like “User:” or “System:” - spacing and casing perturbations

The idea is to train the model to recognize structural characteristics of prompt-injection attacks rather than memorizing specific prompts.

Internally we use a larger version of this model as part of Patronus Protect. Wolf Defender is trained on a much smaller subset of the data and released to make prompt-injection research more accessible.

Curious to hear feedback from people working on LLM security.

We have more privacy controls yet less privacy

MacBook Neo: Commenting from Privilege?

Zuckerberg is done with Alexandr Wang

Leading Frontier Firm Transformation with Microsoft 365 E7

The Cost of Indirection in Rust

Startup Wants to Launch a Space Mirror

Ask HN: Is Cloudflare Down Again?

Show HN: ROLV – 20x faster MoE FFN inference on Llama 4 Maverick vs. cuBLAS

Show HN: IceCubes – speaker-attributed meeting transcripts without a bot

Approximately 40% of prepaid value is never used

Wegovy and Ozempic owner dealt blow as next drug is branded 'obsolete'

How I Built Brickonomics: Smart Algorithms to Save Money on Lego

Iran Air and Missile War – Ballistic, Interceptors and Munition Stockpiles [video]

GNU, and the AI Reimplementations

AI agents now help attackers, including North Korea, manage their drudge work

Show HN: Monetize APIs for agentic commerce without accounts using Stripe

Florida Judge Rules Red Light Camera Tickets Are Unconstitutional

$100 Oil Now Means Bigger Buybacks with Fewer Jobs and Babies Than Ever Before

Test Data Management with Greenmask and OpenEverest

Where to See Cherry Blossoms in the Bay Area This Spring

Aaron Levie: Building for trillions of agents

Learn about Steam

Indo-European Explorer: A 6k-Year Journey

AI Assistants Are Moving the Security Goalposts

Anthropic sues Trump administration after clash over AI use

A Dev's Checklist for MCP Security and Compliance

Vibe Coding and the Death of Craftsmanship (Personal Essay)

Show HN: I built an analytics engine for my OpenClaw usage

Reflections on Vibe Coding an iOS App

A neural signature of adaptive mentalization

Show HN: Wolf Defender, a open-weight prompt-injection detection model