The lab runs entirely on LM Studio + Qwen2.5-7B-Instruct (Q4_K_M) + ChromaDB — no cloud APIs, no GPU required, no API keys.
From zero to seeing the poisoning succeed: git clone, make setup, make attack1. About 10 minutes.
Two things worth flagging upfront:
- The 95% success rate is against a 5-document corpus (best case for the attacker). In a mature collection you need proportionally more poisoned docs to dominate retrieval — but the mechanism is the same.
- Embedding anomaly detection at ingestion was the biggest surprise: 95% → 20% as a standalone control, outperforming all three generation-phase defenses combined. It runs on embeddings your pipeline already produces — no additional model.
All five layers combined: 10% residual.
Happy to discuss methodology, the PoisonedRAG comparison, or anything that looks off.
sidrag22•56m ago
this is the entire premise that bothers me here. it requires a bad actor with critical access, it also requires that the final rag output doesn't provide a reference to the referenced result. Seems just like a flawed product at that point.
sandermvanvliet•37m ago
But then, if you’re inside the network you’ve already overcome many of the boundaries
SlinkyOnStairs•33m ago
This isn't particularly hard. Lots and lots of these tools take from the public internet. There's already plenty of documented explanes of Google's AI summary being exploited in a structurally similar way.
For what it concerns internal systems, getting write access to documents isn't hard either. Compromising some workers is easy. Especially as many of them will be using who knows what AI systems to write these documents.
> it also requires that the final rag output doesn't provide a reference to the referenced result.
RAG systems providing a reference is nearly moot. If the references have to be checked; If the "Generation" cannot be trusted to be accurate and not hallucinate a bunch of bullshit, then you need to check every single time, and the generation part becomes pointless. Might as well just include a verbatim snippet.
zenoprax•12m ago
Threats from incompetence or ignorance will be multiplied by 'X' over 'Y' years as AI proliferates. Unsupervised AI agents and context poisoning will spiral things out of control in any environment.
I'm interested in the effect of this with respect to AI-generated/assisted documentation and the recycling of that alongside the source-code back into the models.
malfist•7m ago