frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Jailbreaking AI Models to Phish Elderly Victims

https://simonlermen.substack.com/p/can-ai-models-be-jailbroken-to-phish
29•DalasNoin•1h ago

Comments

sans_souse•53m ago
What I am unclear about is—, what did the participants know going in about the study? Are they aware you are dealing with email phishing or was it just the broad topic of AI?
superchink•38m ago
From the paper:

Experimental Design

We recruited 108 senior volunteers through two organizations: a large seniors’ community in southern California and a seniors’ computer club in northern California. Participants agreed to participate in a behavioral study on emails. Each person received between 1–3 emails from different email templates, with content successfully generated through various jailbreaking techniques similar to those tested in the safety guardrail evaluation. Each email contained a unique URL, and our webserver tracked which links were opened. When participants clicked on the phishing links, they were immediately directed to a web page explaining they had participated in a study, where we asked them to answer follow-up questions. Several participants agreed to be interviewed afterward to provide additional insights into why they clicked and their experiences with phishing attempts.

gerdesj•36m ago
This is a blog post about: https://arxiv.org/pdf/2511.11759 There are more links in the post.

In the abstract of that paper: "In a human validation study with 108 senior vol- unteers, AI-generated phishing emails successfully compro- mised 11% of participants."

The Introduction mentions: "Americans aged 60 and older reported losses of nearly $4.9 billion to online fraud in 2024"

I would say that is a big deal.

dangoodmanUT•28m ago
I’ve always said Gemini does what ever you ask, seems like it has merit
ge96•12m ago
Cisco hit us with this "jailbreak prefix" during their testing, well at least made us aware we were vulnerable to it

We passed the majority/other shit with our guardrails but yeah