The Webpage Has Instructions. The Agent Has Your Credentials

https://openguard.sh/blog/prompt-injections/

25•everlier•4h ago

Comments

stavros•1h ago

Why does the agent have your credentials? There's no need for that! I made one that doesn't:

https://github.com/skorokithakis/stavrobot

indigodaddy•51m ago

So this is like a claw type thing? I’ve never used these “agents”. Not sure what I would do with them. Probably not for coding right?

stavros•45m ago

Yeah, it's more of a personal assistant. It can do coding, but it's most useful as a PA.

indigodaddy•31m ago

So. Yesterday I had a need to, from my android phone, have ChatGPT et Al mobile app do something I THOUGHT was very simple. Read a publicly available Google spreadsheet (I gave it the /htmlview which in incognito I could see ALL the rows (maybe close to 1000 rows). None could do it. Not ChatGPT, not MS Copilot, not Claude app, not Gemini, not even GitHub copilot in a web tab. Some said I can’t even see that. Some could see it but couldn’t do anything with it. Some could see it but only the first 100 lines. All I wanted to do was have it ingest the entire thing and then spit me back out in a csv or txt any rows that mentioned 4K. Seemed simple but these things couldn’t even get past that first hurdle. Weirdly, I remembered I had the Grok app too and gave it a shot, and, it could do it. I guess it is more intelligent in it’s abilities to scrape/parse/whatever all kinds of different types of sites.

I’d guess this is the type of thing that might actually excel in your agent or these claw clones, because they literally can just do whatever bash/tool type actions on whatever VM or sandboxed environment they live on?

stavros•28m ago

Yeah, I think this was an issue of Google blocking bot user agents more than the LLMs not being smart enough. A bot that can run curl (like mine) should read it no problem.

indigodaddy•24m ago

Ah ok that actually makes sense as the reason. And I think I’ve seen that with even coding agents when they are trying to look up stuff on the web or URLs you give them, now that I think about it..

amelius•23m ago

You can do basically anything with a claw agent. For example, I asked one to build me a Dyson sphere. It is still working on it, but so far so good.

redgridtactical•1h ago

This is the natural consequence of building everything around "the agent needs access to everything to be useful." The more capabilities you hand an agent, the larger the attack surface when it encounters a malicious page.

The simplest mitigation is also the least popular one: don't give the agent credentials in the first place. Scope it to read-only where possible, and treat every page it visits as untrusted input. But that limits what agents can do, which is why nobody wants to hear it.

rocho•53m ago

I absolutely agree, although even that doesn't solve the root problem. The underlying LLM architecture is fundamentally insecure as it doesn't separate between instructions and pure content to read/operate on.

I wonder if it'd be possible to train an LLM with such architecture: one input for the instructions/conversation and one "data-only" input. Training would ensure that the latter isn't interpreted as instructions, although I'm not knowledgeable enough to understand if that's even theoretically possible: even if the inputs are initially separate, they eventually mix in the neural network. However, I imagine that training could be done with massive amounts of prompt injections in the "data-only" input to penalize execution of those instructions.

petesergeant•20m ago

I am building https://agentblocks.ai for just this; you set fine-grained rules on what your agents are allowed to access and when they have to ask you out-of-channel (eg via WhatsApp or Slack) for permissions, with no direct agent access. It works today, well, supports more tools than are on the website, and if you have any need for this at all, I’d love to give you an account: pete@agentblocks.ai

Works great with OpenClaw, Claude Cowork, or anything, really

What makes Intel Optane stand out (2023)

Separating the Wayland compositor and window manager

C++26: The Oxford Variadic Comma

Glassworm Is Back: A New Wave of Invisible Unicode Attacks Hits Repositories

In Memoriam: John W. Addison, my PhD advisor

A Visual Introduction to Machine Learning (2015)

Show HN: GDSL – 800 line kernel: Lisp subset in 500, C subset in 1300

Learning athletic humanoid tennis skills from imperfect human motion data

Hollywood Enters Oscars Weekend in Existential Crisis

Office.eu launches as Europe's sovereign office platform

Rack-mount hydroponics

Show HN: Signet – Autonomous wildfire tracking from satellite and weather data

Show HN: What if your synthesizer was powered by APL (or a dumb K clone)?

Grandparents are glued to their phones, families are worried [video]

LLM Architecture Gallery

Autoresearch Hub

Kniterate Notes

IBM, sonic delay lines, and the history of the 80×24 display (2019)

Bus travel from Lima to Rio de Janeiro

$96 3D-printed rocket that recalculates its mid-air trajectory using a $5 sensor

Generating All 32-Bit Primes (Part I)

The Webpage Has Instructions. The Agent Has Your Credentials

How kernel anti-cheats work

The 100 hour gap between a vibecoded prototype and a working product

Zipp 2001 Restoration

A most elegant TCP hole punching algorithm

Why Mathematica does not simplify sinh(arccosh(x))

UMD Scientists Create 'Smart Underwear' to Measure Human Flatulence

Treasure hunter freed from jail after refusing to turn over shipwreck gold

Allow me to get to know you, mistakes and all

The Webpage Has Instructions. The Agent Has Your Credentials

Comments

What makes Intel Optane stand out (2023)

Separating the Wayland compositor and window manager

C++26: The Oxford Variadic Comma

Glassworm Is Back: A New Wave of Invisible Unicode Attacks Hits Repositories

In Memoriam: John W. Addison, my PhD advisor

A Visual Introduction to Machine Learning (2015)

Show HN: GDSL – 800 line kernel: Lisp subset in 500, C subset in 1300

Learning athletic humanoid tennis skills from imperfect human motion data

Hollywood Enters Oscars Weekend in Existential Crisis

Office.eu launches as Europe's sovereign office platform

Rack-mount hydroponics

Show HN: Signet – Autonomous wildfire tracking from satellite and weather data

Show HN: What if your synthesizer was powered by APL (or a dumb K clone)?

Grandparents are glued to their phones, families are worried [video]

LLM Architecture Gallery

Autoresearch Hub

Kniterate Notes

IBM, sonic delay lines, and the history of the 80×24 display (2019)

Bus travel from Lima to Rio de Janeiro

$96 3D-printed rocket that recalculates its mid-air trajectory using a $5 sensor

Generating All 32-Bit Primes (Part I)

The Webpage Has Instructions. The Agent Has Your Credentials

How kernel anti-cheats work

The 100 hour gap between a vibecoded prototype and a working product

Zipp 2001 Restoration

A most elegant TCP hole punching algorithm

Why Mathematica does not simplify sinh(arccosh(x))

UMD Scientists Create 'Smart Underwear' to Measure Human Flatulence

Treasure hunter freed from jail after refusing to turn over shipwreck gold

Allow me to get to know you, mistakes and all