There's too much fluff here to be useful. I imagine having something that is concise and concrete would make it more appealing to others. But as-is, it's missing a good technical summary and demonstration.
It's less about the RAG exposing new data to a regular user, and more about using the vector pipeline as a covert channel. The idea is to sneak out data the attacker already can access, but in a way that might bypass traditional DLP looking at emails, USBs, etc.
The "fluff" is largely educational material, as the project is for research and learning. For a concrete technical demonstration, the scripts/embed.py and scripts/query.py scripts are the core, and the docs/guides/quick_start.md tries to offer a direct path to seeing it in action.
Hope that helps! Will add a video demo soon.
In theory you don't even need anything in the payload - you could put information in the timing of the DNS requests a la morse code....
HTTP is the obvious other one - with much more options for somebody to exfiltrate data - you can think of ways where you don't even need an evil domain.
For example - you could exfilrate data via hackernews comments!
As far as I can see, the only thing you can do in the end is to make it harder to do easily, and then monitor unusual activity - and hope that is enough to stop large scale exfiltration, as small scale is impossible to stop.
smugglereal•1d ago