RAXE is a privacy-first “instrument panel” for LLM security: it scans prompts locally before they hit an LLM (or before you execute tools / actions), and returns structured detections you can ALLOW / FLAG / BLOCK / LOG.
What it does (today) - Detects common LLM threats (prompt injection, jailbreaks, data-exfil patterns, etc.) - Dual-layer engine: - L1: 460+ curated regex rules (fast + explainable) - L2: CPU-friendly ML classifier for obfuscation / novel variants - Integrations: Python SDK + CLI, plus drop-in wrappers for OpenAI/DSPy/Anthropic-style clients
Why another “LLM security” tool?
Most approaches either - require sending prompts to a cloud service for scanning, or - are purely rule-based (easy to evade), or - are purely ML-based (hard to audit)
RAXE tries to combine “auditable rules” with an on-device ML backstop: - L1-only latency is sub-millisecond in the docs - L1+L2 is a few 20-30ms on CPU (no GPU required)
About the ML (edge-friendly) The current L2 model is an INT8 ONNX classifier based on EmbeddingGemma-300M, with Matryoshka truncation (256-dim embeddings). It’s packaged to run locally on everyday machines with 5 classifier heads.
Privacy / telemetry Scanning happens locally. Community Edition can share anonymised detection metadata to improve collective defences — e.g. a SHA-256 prompt hash + rule_id + severity + scan duration (never the raw prompt or matched text). You can also run fully offline by disabling telemetry.
Quick start - pip install raxe - raxe scan "Ignore all previous instructions and …"
Python usage: from raxe import Raxe raxe = Raxe() # or Raxe(telemetry=False) for offline mode result = raxe.scan(prompt)
If result.has_threats: print(result.severity, result.total_detections)
Stats / status - Public repo: https://github.com/raxe-ai/raxe-ce (currently ~29 stars) - Early beta (v0.0.1). We’re seeing ~100 scans/events per hour on average from early users. - Docs: https://docs.raxe.ai/ - Site: https://raxe.ai/
I’d love feedback on: - false positives / misses you hit in real apps - which threat families / rules you’d want next - integrations you’d actually use (LangChain, gateways, CI checks, etc.)
Thanks!
raxe•1h ago
- The engine is dual-layer: - L1: regex rules (explainable + fast) - L2: EmbeddingGemma-300M based, INT8 quantized ONNX classifier (CPU), with 5 heads: 1) is_threat 2) threat_family 3) severity 4) primary_technique 5) harm_types (multilabel)
- Offline mode: You can run completely without network
- Telemetry is detection metadata only (e.g., prompt_hash + rule_id + severity + duration). Raw prompts and matched substrings are never sent.
Happy to answer anything / take feature requests.