Most injection evasion works by making text look different to a scanner than to the LLM. Homoglyphs, leet speak, zero-width characters, base64 smuggling, ROT13, Unicode confusables — the LLM reads through all of it, but pattern matchers don't.
The project is two curated layers, not code:
Layer 1 — what attackers say. ~35 canonical intent phrases across 8 categories (override, extraction, jailbreak, delimiter, semantic worm, agent proxy, rendering...), multilingual, normalized.
Layer 2 — how they hide it. Curated tables of Unicode confusables, leet speak mappings, LLM-specific delimiters (<|system|>, [INST], <<SYS>>...), dangerous markup patterns. Each table is a maintained dataset that feeds a normalisation stage.
The engine itself is deliberately simple — a 10-stage normalisation pipeline that reduces evasion to canonical form, then strings.Contains + Levenshtein. Think ClamAV: the scan loop is trivial, the definitions are the product.
Long term I'd like both layers to become community-maintained — one curated corpus of injection intents and one of evasion techniques, consumable by any scanner regardless of language or engine.
Everything ships as go:embed JSON, hot-reloadable without rebuild. No regex (no ReDoS), no API calls, no ML in the loop. Single dependency (golang.org/x/text). Scans both inputs and LLM outputs.
result := injection.Scan(text, injection.DefaultIntents()) if result.Risk == "high" { ... }
austin-cheney•1h ago
Horos•1h ago