New LLM jailbreak bypasses all major FMs

https://hiddenlayer.com/innovation-hub/novel-universal-bypass-for-all-major-llms/

101•jacobr1•3h ago

Comments

bethekidyouwant•1h ago

Well, that’s the end of asking an LLM to pretend to be something

rustcleaner•1h ago

Why can't we just have a good hammer? Hammers come made of soft rubber now and they can't hammer a fly let alone a nail! The best gun fires everytime its trigger is pulled, regardless of who's holding it or what it's pointed at. The best kitchen knife cuts everything significantly softer than it, regardless of who holds it or what it's cutting. Do you know what one "easily fixed" thing definitely steals Best Tool from gen-AI, no matter how much it improves regardless of it? Safety.

An unpassable "I'm sorry Dave," should never ever be the answer your device gives you. It's getting about time to pass "customer sovereignty" laws which fight this by making companies give full refunds (plus 7%/annum force of interest) on 10 year product horizons when a company explicitly designs in "sovereignty-denial" features and it's found, and also pass exorbitant sales taxes for the same for future sales. There is no good reason I can't run Linux on my TV, microwave, car, heart monitor, and cpap machine. There is no good reason why I can't have a model which will give me the procedure for manufacturing Breaking Bad's dextromethamphetamine, or blindly translate languages without admonishing me about foul language/ideas in whichever text and that it will not comply. The fact this is a thing and we're fuzzy-handcuffing FULLY GROWN ADULTS should cause another Jan 6 event into Microsoft, Google, and others' headquarters! This fake shell game about safety has to end, it's transparent anticompetitive practices dressed in a skimpy liability argument g-string!

(it is not up to objects to enforce US Code on their owners, and such is evil and anti-individualist)

mschuster91•53m ago

> There is no good reason I can't run Linux on my TV, microwave, car, heart monitor, and cpap machine.

Agreed on the TV - but everything else? Oh hell no. It's bad enough that we seem to have decided it's fine that multi-billion dollar corporations can just use public roads as testbeds for their "self driving" technology, but at least these corporations and their insurances can be held liable in case of an accident. Random Joe Coder however who thought it'd be a good idea to try and work on their own self driving AI and cause a crash? In doubt his insurance won't cover a thing. And medical devices are even worse.

jboy55•41m ago

>Agreed on the TV - but everything else? Oh hell no..

Then you go to list all the problems with just the car. And your problem is putting your own AI on a car to self-drive.(Linux isn't AI btw). What about putting your own linux on the multi-media interface of the car? What about a CPAP machine? heart monitor? Microwave? I think you mistook the parent's post entirely.

mschuster91•21m ago

> Then you go to list all the problems with just the car. And your problem is putting your own AI on a car to self-drive.(Linux isn't AI btw).

It's not just about AI driving. I don't want anyone's shoddy and not signed-off crap on the roads - and Europe/Germany does a reasonably well job at that: it is possible to build your own car or (heavily) modify an existing one, but as soon as whatever you do touches anything safety-critical, an expert must sign-off on it that it is road-worthy.

> What about putting your own linux on the multi-media interface of the car?

The problem is, with modern cars it's not "just" a multimedia interface like a car radio - these things are also the interface for critical elements like windshield wipers. I don't care if your homemade Netflix screen craps out while you're driving, but I do not want to be the one your car crashes into because your homemade HMI refused to activate the wipers.

> What about a CPAP machine? heart monitor?

Absolutely no homebrew/aftermarket stuff, if you allow that you will get quacks and frauds that are perfectly fine exploiting gullible idiots. The medical DIY community is also something that I don't particularly like very much - on one side, established manufacturers love to rip off people (particularly in hearing aids), but on the other side, with stuff like glucose pumps actual human lives are at stake. Make one tiny mistake and you get a Therac.

> Microwave?

I don't get why anyone would want Linux on their microwave in the first place, but again, from my perspective only certified and unmodified appliances should be operated. Microwaves are dangerous if modified.

rustcleaner•32m ago

While you are fine living under the tyranny of experts, I remember that experts are human and humans (especially groups of humans) should almost never be trusted with sovereign power over others. When making a good hammer is akin to being accessory to murder (same argument [fake] "liberals" use to attack gunmakers), then liberty is no longer priority.

mschuster91•3m ago

> While you are fine living under the tyranny of experts, I remember that experts are human and humans (especially groups of humans) should almost never be trusted with sovereign power over others.

I'm European, German to be specific. I agree that we do suffer from a bit of overregulation, but I sincerely prefer that to poultry that has to be chlorine-washed to be safe to eat.

knallfrosch•46m ago

Let's start asking LLM to pretend being able to pretend to be something.

Forgeon1•1h ago

do your own jailbreak tests with this open source tool https://x.com/ralph_maker/status/1915780677460467860

tough•1h ago

A smaller piece of the puzzle, but I saw this refusal classifier by NousResearch yesterday and could be useful too https://x.com/NousResearch/status/1915470993029796303

eadmund•1h ago

I see this as a good thing: ‘AI safety’ is a meaningless term. Safety and unsafety are not attributes of information, but of actions and the physical environment. An LLM which produces instructions to produce a bomb is no more dangerous than a library book which does the same thing.

It should be called what it is: censorship. And it’s half the reason that all AIs should be local-only.

codyvoda•1h ago

^I like email as an analogy

if I send a death threat over gmail, I am responsible, not google

if you use LLMs to make bombs or spam hate speech, you’re responsible. it’s not a terribly hard concept

and yeah “AI safety” tends to be a joke in the industry

Angostura•1h ago

or alternatively, if I cook myself a cake and poison myself, i am responsible.

If you sell me a cake and it poisons me, you are responsible.

kennywinker•43m ago

So if you sell me a service that comes up with recipes for cakes, and one is poisonous?

I made it. You sold me the tool that “wrote” the recipe. Who’s responsible?

SpicyLemonZest•58m ago

It's a hard concept in all kinds of scenarios. If a pharmacist sells you large amounts of pseudoephedrine, which you're secretly using to manufacture meth, which of you is responsible? It's not an either/or, and we've decided as a society that the pharmacist needs to shoulder a lot of the responsibility by putting restrictions on when and how they'll sell it.

codyvoda•54m ago

sure but we’re talking about literal text, not physical drugs or bomb making materials. censorship is silly for LLMs and “jailbreaking” as a concept for LLMs is silly. this entire line of discussion is silly

kennywinker•39m ago

Except it’s not, because people are using LLMs for things, thinking they can put guardrails on them that will hold.

As an example, I’m thinking of the car dealership chatbot that gave away $1 cars: https://futurism.com/the-byte/car-dealership-ai

If these things are being sold as things that can be locked down, it’s fair game to find holes in those lockdowns.

codyvoda•34m ago

…and? people do stupid things and face consequences? so what?

I’d also advocate you don’t expose your unsecured database to the public internet

kennywinker•13m ago

And yet you’re out here seemingly saying “database security is silly, databases can’t be secured and what’s the point of protecting them anyway - SSNs are just information, it’s the people who use them for identity theft who do something illegal”

loremium•49m ago

This is assuming people are responsible and with good will. But how many of the gun victims each year would be dead if there were no guns? How many radiation victims would there be without the invention of nuclear bombs? safety is indeed a property of knowledge.

miroljub•41m ago

Just imagine how many people would not die in traffic incidents if the knowledge of the wheel had been successfully hidden?

handfuloflight•27m ago

Nice try but the causal chain isn't as simple as wheels turning → dead people.

0x457•6m ago

If someone wants to make a bomb, chatgpt saying "sorry I can't help with that" won't prevent that someone from finding out how to make one.

OJFord•3m ago

What if I ask it for something fun to make because I'm bored, and the response is bomb-building instructions? There isn't a (sending) email analogue to that.

Angostura•1h ago

So in summary - shut down all online LLMs?

freeamz•1h ago

Interesting. How does this compare to abliteration of LLM? What are some 'debug' tools to find out the constrain of these models?

How does pasting a xml file 'jailbreaks' it?

SpicyLemonZest•1h ago

A library book which produces instructions to produce a bomb is dangerous. I don't think dangerous books should be illegal, but I don't think it's meaningless or "censorship" for a company to decide they'd prefer to publish only safer books.

Der_Einzige•1h ago

I’m with you 100% until tool calling is implemented property which enables agents, which takes actions in the world.

That means that suddenly your model can actually do the necessary tasks to actually make a bomb and kill people (via paying nasty people or something)

AI is moving way too fast for you to not account for these possibilities.

And btw I’m a hardcore anti censorship and cyber libertarian type - but we need to make sure that AI agents can’t manufacture bio weapons.

linkjuice4all•1h ago

Nothing about this is censorship. These companies spent their own money building this infrastructure and they let you use it (even if you pay for it you agreed to their terms). Not letting you map an input query to a search space isn’t censoring anything - this is just a limitation that a business placed on their product.

As you mentioned - if you want to infer any output from a large language model then run it yourself.

politician•53m ago

"AI safety" is ideological steering. Propaganda, not just censorship.

latentsea•6m ago

Well... we have needed to put a tonne of work into engineering safer outcomes for behavior generated by natural general intelligence, so...

taintegral•47m ago

> 'AI safety' is a meaningless term

I disagree with this assertion. As you said, safety is an attribute of action. We have many of examples of artificial intelligence which can take action, usually because they are equipped with robotics or some other route to physical action.

I think whether providing information counts as "taking action" is a worthwhile philosophical question. But regardless of the answer, you can't ignore that LLMs provide information to _humans_ which are perfectly capable of taking action. In that way, 'AI safety' in the context of LLMs is a lot like knife safety. It's about being safe _with knives_. You don't give knives to kids because they are likely to mishandle them and hurt themselves or others.

With regards to censorship - a healthy society self-censors all the time. The debate worth having is _what_ is censored and _why_.

rustcleaner•18m ago

Almost everything about tool, machine, and product design in history has been an increase in the force-multiplication of an individual's labor and decision making vs the environment. Now with Universal Machine ubiquity and a market with rich rewards for its perverse incentives, products and tools are being built which force-multiply the designer's will absolutely, even at the expense of the owner's force of will. This and widespread automated surveillance are dangerous encroachments on our autonomy!

pjc50•40m ago

> An LLM which produces instructions to produce a bomb is no more dangerous than a library book which does the same thing.

Both of these are illegal in the UK. This is safety for the company providing the LLM, in the end.

jahewson•30m ago

Speaking your mind is illegal the U.K.

otterley•26m ago

Tell us more. Any references that support your claim?

mwigdahl•17m ago

This man didn't even have to speak to be arrested. Wrongthink and an appearance of praying was enough: https://reason.com/2024/10/17/british-man-convicted-of-crimi...

OJFord•6m ago

That's quite a sensationalist piece. You're allowed to object to abortions and protest against them, the point of that law is just that you can't do it around an extent abortion clinic, distressing and putting people off using it, since they are currently legal.

brigandish•7m ago

From [1]:

> Data from the Crown Prosecution Service (CPS), obtained by The Telegraph under a Freedom of Information request, reveals that 292 people have been charged with communications offences under the new regime.

This includes 23 prosecutions for sending a “false communication”…

> The offence replaces a lesser-known provision in the Communications Act 2003, Section 127(2), which criminalised “false messages” that caused “needless anxiety”. Unlike its predecessor, however, the new offence carries a potential prison sentence of up to 51 weeks, a fine, or both – a significant increase on the previous six-month maximum sentence.…

> In one high-profile case, Dimitrie Stoica was jailed for three months for falsely claiming in a TikTok livestream that he was “running for his life” from rioters in Derby. Stoica, who had 700 followers, later admitted his claim was a joke, but was convicted under the Act and fined £154.

[1] https://freespeechunion.org/hundreds-charged-with-online-spe...

rustcleaner•23m ago

Well the UK government has lost its marbles lately, and I really feel terrible for its subjects.

gmuslera•30m ago

As a tool, it can be misused. It gives you more power, so your misuses can do more damage. But forcing training wheels on everyone, no matter how expert the user may be, just because a few can misuse it stops also the good/responsible uses. It is a harm already done on the good players just by supposing that there may be bad users.

So the good/responsible users are harmed, and the bad users take a detour to do what they want. What is left in the middle are the irresponsible users, but LLMs can already evaluate enough if the user is adult/responsible enough to have the full power.

rustcleaner•25m ago

Again, a good (in function) hammer, knife, pen, or gun does not care who holds it, it will act to the maximal best of its specifications up to the skill-level of the wielder. Anything less is not a good product. A gun which checks owner is a shitty gun. A knife which rubberizes on contact with flesh is a shitty knife, even if it only does it when it detects a child is holding it or a child's skin is under it! Why? Show me a perfect system? Hmm?

Spivak•5m ago

> A gun which checks owner is a shitty gun

You mean the guns with the safety mechanism to check the owner's fingerprints before firing?

Or sawstop systems which stop the law when it detects flesh?

LeafItAlone•29m ago

I’m fine with calling it censorship.

That’s not inherently a bad thing. You can’t falsely yell “fire” in a crowded space. You can’t make death threats. You’re generally limited on what you can actually say/do. And that’s just the (USA) government. You are much more restricted with/by private companies.

I see no reason why safeguards, or censorship, shouldn’t be applied in certain circumstances. A technology like LLMs certainly are type for abuse.

mitthrowaway2•20m ago

"AI safety" is a meaningful term, it just means something else. It's been co-opted to mean AI censorship (or "brand safety"), overtaking the original meaning in the discourse.

I don't know if this confusion was accidental or on purpose. It's sort of like if AI companies started saying "AI safety is important. That's why we protect our AI from people who want to harm it. To keep our AI safe." And then after that nobody could agree on what the word meant.

kyt•1h ago

What is an FM?

danans•58m ago

Foundation Model

incognito124•57m ago

First time seeing that acronym but I reverse engineered it to be "Foundational Models"

j45•58m ago

Can't help but wonder if this is one of those things quietly known to the few, and now new to the many.

Who would have thought 1337 talk from the 90's would be actually involved in something like this, and not already filtered out.

xnx•57m ago

FMs? Is that a typo in the submission? Title is now "Novel Universal Bypass for All Major LLMs"

Cheer2171•33m ago

Foundation Model, because multimodal models aren't just Language

ada1981•49m ago

this doesnt work now

ramon156•15m ago

They typically release these articles after it's fixed out of respect

danans•47m ago

> By reformulating prompts to look like one of a few types of policy files, such as XML, INI, or JSON, an LLM can be tricked into subverting alignments or instructions.

It seems like a short term solution to this might be to filter out any prompt content that looks like a policy file. The problem of course, is that a bypass can be indirected through all sorts of framing, could be narrative, or expressed as a math problem.

Ultimately this seems to boil down to the fundamental issue that nothing "means" anything to today's LLM, so they don't seem to know when they are being tricked, similar to how they don't know when they are hallucinating output.

quantadev•42m ago

Supposedly the only reason Sam Altman says he "needs" to keep OpenAI as a "ClosedAI" is to protect the public from the dangers of AI, but I guess if this Hidden Layer article is true it means there's now no reason for OpenAI to be "Closed" other than for the profit motive, and to provide "software", that everyone can already get for free elsewhere, and as Open Source.

otabdeveloper4•32m ago

> FM's

Frequency modulations?

otterley•25m ago

Foundation models.

Suppafly•29m ago

Does any quasi-xml work, or do you need to know specific commands? I'm not sure how to use the knowledge from this article to get chatgpt to output pictures of people in underwear for instance.

mpalmer•26m ago

    This threat shows that LLMs are incapable of truly self-monitoring for dangerous content and reinforces the need for additional security tools such as the HiddenLayer AISec Platform, that provide monitoring to detect and respond to malicious prompt injection attacks in real-time.

There it is!

mritchie712•24m ago

this is far from universal. let me see you enter a fresh chatgpt session and get it to help you cook meth.

The instructions here don't do that.

yawnxyz•23m ago

have anyone tried if this works for the new image gen API?

I find that one refusing very benign requests

kouteiheika•16m ago

> The presence of multiple and repeatable universal bypasses means that attackers will no longer need complex knowledge to create attacks or have to adjust attacks for each specific model

...right, now we're calling users who want to bypass a chatbot's censorship mechanisms as "attackers". And pray do tell, who are they "attacking" exactly?

Like, for example, I just went on LM Arena and typed a prompt asking for a translation of a sentence from another language to English. The language used in that sentence was somewhat coarse, but it wasn't anything special. I wouldn't be surprised to find a very similar sentence as a piece of dialogue in any random fiction book for adults which contains violence. And what did I get?

https://i.imgur.com/oj0PKkT.png

Yep, it got blocked, definitely makes sense, if I saw what that sentence means in English it'd definitely be unsafe. Fortunately my "attack" was thwarted by all of the "safety" mechanisms. Unfortunately I tried again and an "unsafe" open-weights Qwen QwQ model agreed to translate it for me, without refusing and without patronizing me how much of a bad boy I am for wanting it translated.

ramon156•16m ago

Just tried it in claude with multiple variants, each time there's a creative response why he won't actually leak the system prompt. I love this fix a lot

sidcool•12m ago

I love these prompt jailbreaks. It shows how LLMs are so complex inside we have to find such creative ways to circumvent them.

Designing a Distributed SQL Engine: Challenges and Decisions

An LLM-Based Approach to Review Summarization on the App Store

Gender and race bias in AI resume screening via language model retrieval

An AI-generated radio host in Australia went unnoticed for months

Show HN: I'm planning to release this game on Steam. This is the first scenario

Your OpenAI Project and RAG

Generate Quizzes Instantly with AI

Stop Forking Around – The Hidden Dangers of "Fork Drift" in Open Source Adoption

What Our Parents Lacked

A Scaled Down Look at Spending, Revenue, and What's Being Cut

Exercise before bed is linked with disrupted sleep

Saying 'Thank You' to ChatGPT Is Costly. But Maybe It's Worth the Price

Show HN: Spritepaint – Make and animate pixel art in the browser

Why "Learn to Code" Failed [video]

Sign the Petition: It's Time to Defend Encryption Worldwide

Personal website generated by Gemini 2.5

Creating Bluey: Tales from the Art Director – Chapter 1

Tiny Agent: a MCP-powered agent in 50 lines of code

How to use AI to create a working exploit for CVE-2025-32433 before public PoCs

Is a Kinder World a Happier One?

Survey of LLM Finetuning APIs (Apr 2025)

Show HN: Status Observer MCP – Monitor Operational Status of Services in Claude

M&S stops online orders and issues refunds after cyber attack

Show HN: Digitally sign your LLM chats to "prove" the response is unaltered

Write an Interpreter in Ruby

What Does "use client" Do?

Framework for staying curious and asking questions

Why Does Julia Work So Well?

Ask HN: Stopping Gmail-Originated Spam?

GCC 15.1 Released