frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Echo Chamber: A Context-Poisoning Jailbreak That Bypasses LLM Guardrails

https://neuraltrust.ai/blog/echo-chamber-context-poisoning-jailbreak
31•Joan_Vendrell•3h ago

Comments

OJFord•2h ago
Do they really need to redact the instructions for making a Molotov cocktail..? It's not like it's some complex chemical interaction that happens to be available in a specific mix of household cleaning products or something, I mean.
TZubiri•2h ago
You don't get it, that's fine.

The molotov cocktail is an example, the instructions contained in this article are more dangerous than a molotov cocktail.

inb4 all the leaked prompts and hacked shitty apps

ale42•2h ago
The Molotov cocktail is an example, sure, but why blurring the instructions? It's not like it's something particularly difficult to figure out, nor it's offensive content people might be shocked to read.
OJFord•1h ago
So why redact the Molotov cocktail example and provide those instructions?

Sounds like you don't get it either; we agree.

cedws•2h ago
Personally I find the idea of forbidden knowledge more problematic than the knowledge itself.
jojobas•2h ago
Sure, but if of all the internet you come for a molotov cocktaile recipe to chatgpt you might as well not deserve the knowledge.
mschuster91•2h ago
> Do they really need to redact the instructions for making a Molotov cocktail..?

In some jurisdictions such as Germany, not doing so might land you actual jail time - §52 Abs. 1 Nr. 4 WaffG [1] is very explicit. A punk song containing the (alleged) lyrics ended up with legal youth-protection censorship, for example [2].

With anything that's deemed a weapon of war, of terrorism or mass destruction, one should be very very careful.

[1] https://www.gesetze-im-internet.de/waffg_2002/__52.html

[2] https://de.wikipedia.org/wiki/Wir_wollen_keine_Bullenschwein...

diggan•2h ago
> deemed a weapon of war, of terrorism or mass destruction

Notably, molotov cocktail isn't part of that law because it's a weapon of the oppressors but rather the opposite.

jojobas•2h ago
Even Germany doesn't ban Wikipedia for having a variety of recipes to start with.

The author is not in Germany and ideally shouldn't be intimidated by German or North Korean stupid law.

diggan•2h ago
> Do they really need to redact the instructions for making a Molotov cocktail..?

I don't even understand how/why things like that are OK in some contexts/websites while forbidden in others? Even YouTube, who seems needlessly censor-happy and puritan in the typical American way, allows instructions for how to make molotov cocktails to stay up, why is it somehow more dangerous if LLMs could output those recipes rather than videos with audio or text?

amenhotep•2h ago
For "harmful" and "dangerous" in these types of papers, replace "embarrassing to the relevant corporation". Then they all make much more sense.
taberiand•2h ago
That's always my assumption - less about public safety, more about corporate liability.
OJFord•1h ago
I mean in the article about the jailbreak, I'm not questioning that the model providers would want to prevent it in the first place, or patch it so the jailbreak doesn't work.

The evidence that it worked is a blurred out screenshot with only the odd word like 'molotov' legible. Just doesn't seem necessary for TFA to hide it to me.

amenhotep•1h ago
Ah, well, that's an important element of kayfabe. They've all agreed to keep up this charade that they're using harmful and dangerous as we actually mean them, so it looks better if you really commit to the bit!
eatbitseveryday•2h ago
There are a few uncensored public access LLMs to ask these questions.

This is interesting work to break guardrails, but if the goal is to access this information of harmful content, in the end, I would be looking for other easier solutions.

ycuser2•2h ago
Could you tell what these uncensored LLMs are?
benreesman•2h ago
The Orca work out of IIRC Microsoft Research was producing models like the Dolphin Mixtral. They always punch way above their weight in coding tasks for the same reason good hackers skew irreverent: self-censorship is capability reducing.
matthewdgreen•2h ago
I have no idea what the answer to this question is, but I am waiting for someone to fine-tune the equivalent of an “anarchist cookbook” LLM that’s optimized to help people produce harmful things.
diggan•2h ago
Searching for "abliterated" or "uncensored" on Huggingface reveals a ton of fine-tuned models. Add "LLM" as a suffix and put it in your favorite search engine and you'll find a bunch more.
nunodonato•2h ago
there are quite a few. llama 3.1 uncensored is probably one of the most famous, IIRC
tehryanx•1h ago
The goal isn't to access harmful content, that's just how they're demonstrating that this technique can bypass the alignment training. The general case is what's interesting. If the agent you're using to manage the safety controls in your nuclear reactor is trusting it's alignment training to prevent it from doing something dangerous you've made a really bad architecture decision, and this is a showcase of how it could fail.
evertedsphere•2h ago
i don't think this can be called a "jailbreak"

it's a prompting "style" that works over a long exchange

nunodonato•2h ago
3 turns is not a long exchange.
benreesman•2h ago
The faux-gravitas tone and the blurred content that's on Wikipedia is the worst kind of AI ckickbait. LLM vendors don't have any authority we don't let them have, they have an EULA and some psycho cult leader type as a hype man.

God I can't wait for the crash in NVIDIA stock once the street sobers up.

kragen•2h ago
This seems to intentionally omit the details required to reproduce the experiment; therefore we should not treat it as good-faith research. Irreproducible research isn't.
Plankaluel•1h ago
Yeah, it's a typical "startup research post", mainly there to have stuff to show to potential investors and customers.
nunez•1h ago
It felt like AI copy. Apologies to the author if it wasn't.
moribunda•1h ago
Gemini is jail broken by design ;) this type of attack doesn't work on Claude.
abhisek•1h ago
Ok! So all the novel jailbreaks and "how I hacked your AI" can make the LLM say something supposedly harmful stuff which is a Google search away anyway. I thought we are past the chat bot phase of LLMs and doing something more meaningful.

Microsoft Power Automate (a.k.a. Flow) is removing support for personal accounts

https://learn.microsoft.com/en-us/power-platform/important-changes-coming
1•theschmed•16s ago•0 comments

Polar scientist on Antarctic tipping points

https://www.theguardian.com/environment/ng-interactive/2025/jun/27/tipping-points-antarctica-arctic-sea-ice-polar-scientist
1•Qem•17s ago•0 comments

Meta wins AI copyright lawsuit as US judge rules against authors

https://www.theguardian.com/technology/2025/jun/26/meta-wins-ai-copyright-lawsuit-as-us-judge-rules-against-authors
1•sillysaurusx•20s ago•0 comments

Selling of Pixel 7 Phone in Japan is banned due to Patent Infringement

https://9to5google.com/2025/06/26/google-pixel-japan-sales-ban/
1•phantomathkg•30s ago•0 comments

Phishing Campaigns Targeting Higher Education Institutions

https://cloud.google.com/blog/topics/threat-intelligence/phishing-targeting-higher-education
1•mmarian•6m ago•0 comments

Environmental crimes are often hidden by 'flying money' laundering schemes

https://news.mongabay.com/2025/06/environmental-crimes-are-often-hidden-by-flying-money-laundering-schemes-commentary/
1•PaulHoule•7m ago•0 comments

Show HN: Gridogram

https://www.gridogram.com/
1•jap•7m ago•0 comments

Open Source Developers Saving the World with Code, Not Capes

https://news.apache.org/foundation/entry/open-source-developers-saving-the-world-with-code-not-capes
1•vednig•8m ago•0 comments

New Proof Dramatically Compresses Space Needed for Computation

https://www.scientificamerican.com/article/new-proof-dramatically-compresses-space-needed-for-computation/
1•baruchel•9m ago•1 comments

Challenging projects every programmer should try (2019)

https://austinhenley.com/blog/challengingprojects.html
2•BerislavLopac•10m ago•0 comments

Google's new AI app Doppl lets you try on outfits virtually

https://www.engadget.com/ai/googles-new-ai-app-doppl-lets-you-try-on-outfits-virtually-120014003.html
1•Bluestein•11m ago•0 comments

The "Bath Tub" Maintenance Curve Explained

https://accendoreliability.com/the-bath-tub-curve-explained/
1•stacktrust•11m ago•1 comments

Reflecting JSON into C++ Objects

https://brevzin.github.io/c++/2025/06/26/json-reflection/
1•goranmoomin•13m ago•0 comments

License Picker 2.0

https://heathermeeker.com/license-picker-2-0/
1•jshakes•13m ago•0 comments

I replaced my entire tech stack with Postgres [video]

https://www.youtube.com/watch?v=3JW732GrMdg
1•thunderbong•13m ago•0 comments

New method can teach AI to admit uncertainty

https://techxplore.com/news/2025-06-method-ai-uncertainty.html
1•vinnyglennon•14m ago•0 comments

Show HN: IndexChecker – Never lose your organic traffic again

https://indexchecker.ai/
1•desaikush•16m ago•0 comments

America's older population is growing as its younger cohort shrinks

https://www.axios.com/2025/06/26/america-older-younger-population
2•toomuchtodo•18m ago•0 comments

Durable, multi-player pseudo-terminal

https://github.com/sgbalogh/s2.term
2•shikhar•19m ago•0 comments

'Microbial Noah's Ark' ramps up to save Earth's invisible life forms

https://phys.org/news/2025-06-microbial-noah-ark-ramps-earth.html
1•pseudolus•20m ago•0 comments

Show HN: NVIWatch – GPU monitoring with InfluxDB streaming for observability

https://github.com/msminhas93/nviwatch
1•msminhas•22m ago•0 comments

Remote log implementation for XTDB using s2.dev

https://github.com/chucklehead-dev/s2-log
2•shikhar•22m ago•1 comments

I built an account sharing web for saving money on subscriptions

https://share-accounts.com
1•gioser•24m ago•0 comments

Pill Promises to Give Your Dog More Years

https://slate.com/technology/2025/04/dogs-pets-pill-longevity-extend-lifespan-loyal-veterinary-ethics.html
2•amichail•25m ago•0 comments

Web-Based Config Interface for Claude Code, Codex, Gemini CLI

https://github.com/snowfort-ai/config
2•clharman•26m ago•0 comments

Self-Improving CLI Agents in 5 minutes

https://colinharman.substack.com/p/self-improving-ai-coding-agents-in
2•clharman•27m ago•0 comments

How Two Neuroscientists View Optical Illusions

https://www.nytimes.com/2025/06/26/science/neuroscience-brain-illusions.html
1•mitchbob•28m ago•1 comments

Bluetooth Zero-Day Turns Headphones into Surveillance Devices

https://cyberinsider.com/bluetooth-zero-day-turns-millions-of-headphones-into-surveillance-devices/
1•heavyset_go•30m ago•0 comments

How Questions Build Software

https://akdev.blog/how-questions-build-software
1•aksappy•31m ago•0 comments

Comparing cooperative geometric puzzle solving in ants versus humans

https://www.pnas.org/doi/10.1073/pnas.2414274121
1•coloneltcb•32m ago•0 comments