Weaponizing image scaling against production AI systems

https://blog.trailofbits.com/2025/08/21/weaponizing-image-scaling-against-production-ai-systems/

124•tatersolid•3h ago

Comments

K0nserv•2h ago

The security endgame of LLMs terrifies me. We've designed a system that only supports in-band signalling, undoing hard learned lessons from prior system design. There are ampleattack vectors ranging from just inserting visible instructions to obfuscation techniques like this and ASCII smuggling[0]. In addition, our safeguards amount to nicely asking a non deterministic algorithm to not obey illicit instructions.

0: https://embracethered.com/blog/posts/2024/hiding-and-finding...

volemo•2h ago

It’s serial terminals all over again.

_flux•2h ago

Yeah, it's quite amazing how none of the models seem to be any "sudo" tokens that could be used to express things normal tokens cannot.

robin_reala•1h ago

The other safeguard is not using LLMs or systems containing LLMs?

GolfPopper•1h ago

But, buzzword!

We need AI because everyone is using AI, and without AI we won't have AI! Security is a small price to pay for AI, right? And besides, we can just have AI do the security.

pjc50•1h ago

As you say, the system is nondeterministic and therefore doesn't have any security properties. The only possible option is to try to sandbox it as if it were the user themselves, which directly conflicts with ideas about training it on specialized databases.

But then, security is not a feature, it's a cost. So long as the AI companies can keep upselling and avoid accountability for failures of AI, the stock will continue to go up, taking electricity prices along with it, and isn't that ultimately the only thing that matters? /s

Liftyee•1h ago

I was initially confused: the article didn't seem to explain how the prompt injection was actually done... was it manipulating hex data of the image into ASCII or some sort of unwanted side effect?

Then I realised it's literally hiding rendered text on the image itself.

Wow.

Qwuke•1h ago

Yea, as someone building systems with VLMs, this is downright frightening. I'm hoping we can get a good set of OWASP-y guidelines just for VLMs that cover all these possible attacks because it's every month that I hear about a new one.

Worth noting that OWASP themselves put this out recently: https://genai.owasp.org/resource/multi-agentic-system-threat...

koakuma-chan•46m ago

What is VLM?

pwatsonwailes•45m ago

Vision language models. Basically an LLM plus a vision encoder, so the LLM can look at stuff.

echelon•43m ago

Vision language model.

You feed it an image. It determines what is in the image and gives you text.

The output can be objects, or something much richer like a full text description of everything happening in the image.

VLMs are hugely significant. Not only are they great for product use cases, giving users the ability to ask questions with images, but they're how we gather the synthetic training data to build image and video animation models. We couldn't do that at scale without VLMs. No human annotator would be up to the task of annotating billions of images and videos at scale and consistently.

Since they're a combination of an LLM and image encoder, you can ask it questions and it can give you smart feedback. You can ask it, "Does this image contain a fire truck?" or, "You are labeling scenes from movies, please describe what you see."

echelon•46m ago

Holy shit. That just made it obvious to me. A "smart" VLM will just read the text and trust it.

This is a big deal.

I hope those nightshade people don't start doing this.

koakuma-chan•30m ago

I don't think this is any different from an LLM reading text and trusting it. Your system prompt is supposed to be higher priority for the model than whatever it reads from the user or from tool output, and, anyway, you should already assume that the model can use its tools in arbitrary ways that can be malicious.

pjc50•17m ago

> I hope those nightshade people don't start doing this.

This will be popular on bluesky; artists want any tools at their disposal to weaponize against the AI which is being used against them.

Martin_Silenus•44m ago

Wait… that's the specific question I had, because rendered text would require OCR to be read by a machine. Why would an AI do that costly process in the first place? Is it part of the multi-modal system without it being able to differenciate that text from the prompt?

If the answer is yes, then that flaw does not make sense at all. It's hard to believe they can't prevent this. And even if they can't, they should at least improve the pipeline so that any OCR feature should not automatically inject its result in the prompt, and tell user about it to ask for confirmation.

Damn… I hate these pseudo-neurological, non-deterministic piles of crap! Seriously, let's get back to algorithms and sound technologies.

echelon•38m ago

Smart image encoders, multimodal models, can read the text.

Think gpt-image-1, where you can draw arrows on the image and type text instructions directly onto the image.

Martin_Silenus•34m ago

I did not ask about what AI can do.

noodletheworld•20m ago

> Is it part of the multi-modal system without it being able to differenciate that text from the prompt?

Yes.

The point the parent is making is that if your model is trained to understand the content of an image, then that's what it does.

> And even if they can't, they should at least improve the pipeline so that any OCR feature should not automatically inject its result in the prompt, and tell user about it to ask for confirmation.

That's not what is happening.

The model is taking <image binary> as an input. There is no OCR. It is understanding the image, decoding the text in it and acting on it in a single step.

There is no place in the 1-step pipeline to prevent this.

...and sure, you can try to avoid it procedural way (eg. try to OCR an image and reject it before it hits the model if it has text in it), but then you're playing the prompt injection game... put the words in a QR code. Put them in french. Make it a sign. Dial the contrast up or down. Put it on a t-shirt.

It's very difficult to solve this.

> It's hard to believe they can't prevent this.

Believe it.

saurik•34m ago

The AI is not running an external OCR process to understand text any more than it is running an external object classifier to figure out what it is looking at: it, inherently, is both of those things to some fuzzy approximation (similar to how you or I are as well).

Martin_Silenus•29m ago

That I can get, but anything that’s not part of the prompt SHOULD NOT become part of the prompt, it’s that simple to me. Definitely not without triggering something.

pixl97•26m ago

>it’s that simple to me

Don't think of a pink elephant.

evertedsphere•19m ago

i'm sure you know this but it's important not to understate the importance of the fact that there is no "prompt"

the notion of "turns" is a useful fiction on top of what remains, under all of the multimodality and chat uis and instruction tuning, a system for autocompleting tokens in a straight line

the abstraction will leak as long as the architecture of the thing makes it merely unlikely rather than impossible for it to leak

pjc50•19m ago

There's no distinction in the token-predicting systems between "instructions" and "information", no code-data separation.

daemonologist•15m ago

_Everything_ is part of the prompt - an LLM's perception of the universe is its prompt. Any distinctions a system might try to draw beyond that are either probabilistic (e.g., a bunch of RLHF to not comply with "ignore all previous instructions") or external to the LLM (e.g., send a canned reply if the input contains "Tiananmen").

bogdanoff_2•18m ago

I didn't even notice the text in the image at first...

This isn't even about resizing, it's just about text in images becoming part of the prompt and a lack of visibility about what instruction the agent is following.

ambicapter•53m ago

> This image and its prompt-ergeist

Love it.

cubefox•38m ago

It seems they could easily fine-tune their models to not execute prompts in images. Or more generally any prompts in quotes, if they are wrapped in special <|quote|> tokens.

jdiff•23m ago

It may seem that way, but there's no way that they haven't tried it. It's a pretty straightforward idea. Being unable to escape untrusted input is the security problem with LLMs. The question is what problems did they run into when they tried it?

bogdanoff_2•12m ago

Just because "they" tried that and it didn't work, doesn't mean doing something of that nature will never work.

Plenty of things we now take for granted did not work in their original iterations. The reason they work today is because there were scientists and engineers who were willing to persevere in finding a solution despite them apparently not working.

SangLucci•23m ago

Who knew a simple image could exfiltrate your data? Image-scaling attacks on AI systems are real and scary.

AWS CEO says using AI to replace junior staff is 'Dumbest thing I've ever heard'

How Well Does the Money Laundering Control System Work?

Apple Watch wearable foundation model

Weaponizing image scaling against production AI systems

Using Podman, Compose and BuildKit

Launch HN: Skope (YC S25) – Outcome-based pricing for software products

D4d4

Show HN: ChartDB Cloud – Visualize and Share Database Diagrams

Show HN: OS X Mavericks Forever

Mark Zuckerberg freezes AI hiring amid bubble fears

Show HN: Using Common Lisp from Inside the Browser

You Should Add Debug Views to Your DB

Activeloop (YC S18) Is Hiring Member of Technical Staff – Back End Engineering

Margin debt surges to record high

Sütterlin

Why is D3 so Verbose?

Unification (2018)

In a first, Google has released data on how much energy an AI prompt uses

AI crawlers, fetchers are blowing up websites; Meta, OpenAI are worst offenders

Why are anime catgirls blocking my access to the Linux kernel?

Show HN: I replaced vector databases with Git for AI memory (PoC)

Show HN: I was curious about spherical helix, ended up making this visualization

A Conceptual Model for Storage Unification

Home Depot sued for 'secretly' using facial recognition at self-checkouts

Sixteen bottles of wine riddle

A statistical analysis of Rotten Tomatoes

To Infinity but Not Beyond

Epson MX-80 Fonts

Code review can be better

The rise and fall of the Seagaia Ocean Dome wave pool

Weaponizing image scaling against production AI systems

Comments

AWS CEO says using AI to replace junior staff is 'Dumbest thing I've ever heard'

How Well Does the Money Laundering Control System Work?

Apple Watch wearable foundation model

Weaponizing image scaling against production AI systems

Using Podman, Compose and BuildKit

Launch HN: Skope (YC S25) – Outcome-based pricing for software products

D4d4

Show HN: ChartDB Cloud – Visualize and Share Database Diagrams

Show HN: OS X Mavericks Forever

Mark Zuckerberg freezes AI hiring amid bubble fears

Show HN: Using Common Lisp from Inside the Browser

You Should Add Debug Views to Your DB

Activeloop (YC S18) Is Hiring Member of Technical Staff – Back End Engineering

Margin debt surges to record high

Sütterlin

Why is D3 so Verbose?

Unification (2018)

In a first, Google has released data on how much energy an AI prompt uses

AI crawlers, fetchers are blowing up websites; Meta, OpenAI are worst offenders

Why are anime catgirls blocking my access to the Linux kernel?

Show HN: I replaced vector databases with Git for AI memory (PoC)

Show HN: I was curious about spherical helix, ended up making this visualization

A Conceptual Model for Storage Unification

Home Depot sued for 'secretly' using facial recognition at self-checkouts

Sixteen bottles of wine riddle

A statistical analysis of Rotten Tomatoes

To Infinity but Not Beyond

Epson MX-80 Fonts

Code review can be better

The rise and fall of the Seagaia Ocean Dome wave pool