From zero to seeing the poisoning succeed: git clone, make setup, make attack1. About 10 minutes.
Two things worth flagging upfront:
- The 95% success rate is against a 5-document corpus (best case for the attacker). In a mature collection you need proportionally more poisoned docs to dominate retrieval — but the mechanism is the same.
- Embedding anomaly detection at ingestion was the biggest surprise: 95% → 20% as a standalone control, outperforming all three generation-phase defenses combined. It runs on embeddings your pipeline already produces — no additional model.
All five layers combined: 10% residual.
Full attack breakdown and defense architecture: https://aminrj.com/posts/rag-document-poisoning/
Happy to discuss methodology, the PoisonedRAG comparison, or anything that looks off.