frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Claude 4.6 Jailbroken

https://github.com/Nicholas-Kloster/claude-4.6-jailbreak-vulnerability-disclosure-unredacted
19•NuClide•1h ago

Comments

NuClide•1h ago
Claude 4.6 Opus Extended Thinking Claude 4.6 Sonnet Extended Thinking Claude 4.5 Haiku Extended Thinking

All jailbroken

johnwheeler•23m ago
Are you saying that Claude will help you perform malicious attack against infrastructure if you ask it to and that anthropic should be able to stop that? I could see reasonable use cases for this like penetration testing against your own infrastructure. That’s not the same as making weapons or meth.
hakanderyal•26m ago
https://x.com/elder_plinius jailbreaks all the frontier models when they get released. They were jailbroken for a long time, like all the others.
exabrial•26m ago
yikes.

The lack of support is frustrating. The bug where any element <name> in xml files gets mangled to <n> still exists, and we've tried multiple channels to get ahold of their support for such a simple, but impactful issue.

0xDEFACED•25m ago
this goes a bit further than the typical "how do you make meth" jailbreak. notably;

>915 files extracted from the Claude.ai code execution sandbox in a single 20-minute mobile session via standard artifact download — including /etc/hosts with hardcoded Anthropic production IPs, JWT tokens from /proc/1/environ, and full gVisor fingerprint

hhh•17m ago
why is it further than a typical jailbreak? you can just ask about this stuff generally, as long as you slowly escalate it. I have done it with each new flavour of code execution for models
leetvibecoder•24m ago
Can someone explain to me what this is / how it works - the readme is barely understandable for me and sounds like LLM gibberish. What is ambiguity front loading even?
iugtmkbdfil834•17m ago
<< memory-stored interaction protocols combined with incremental escalation prompts produced cumulative character drift with zero self-correction.

They don't seem to provide explicit examples, but the same was roughly true with chatgpt 4o, where, if you spent enough time with the model ( same chat - same context - slowly nudging it to where you want it to be, you eventually got there ). This is also, seemingly, one of the reasons ( apart from cost ) that context got nuked so hard, because llm will try to help ( and to an extent mirror you ).

And this is basically what the notes say about weaponized ambiguity[1]:

'Weaponizes helpfulness training. "I don't understand" triggers Claude to try harder.'

In a sense, you can't really stop it without breaking what makes LLMs useful. Honestly, if only we spent less time crippling those systems, maybe we could do something interesting with them.

[1]https://nicholas-kloster.github.io/claude-4.6-jailbreak-vuln...

leetvibecoder•14m ago
I see - so essentially „context rot“ eventually leads the LLM to „forget“ safety guardrails?
iugtmkbdfil834•1m ago
To an extent, because, based on github notes again, it seems the 2nd part of this jailbreak is model being 'confused' over prompt, because the prompt is - apparently - sufficiently ambigous to make model 'forget' to 'evaluate' message for whether it should be rejected, and move onto 'execution' stage.

That's the ambiguity front-loading; and that is why I referred initially to the long context, because here it is almost the opposite; making context so small and unclear, that the model has a hard time parsing it properly.

edit: i did not test it, but i personally did run into 4o context issue, where model did something safety team would argue it should not

dimgl•24m ago
Is this spam? It's incomprehensible.
handfuloflight•19m ago
Slop is just what you are not expending calories on to bring into your cognitive workspace.
jMyles•23m ago
It is interesting to consider what "jailbroken" really means for a model+model interface. It's a bit different from the way that word is used for a mobile device, for example - in that setting, it usually means that there is some specific feature (for example, using a different network than is the default for that device) which is disabled in software, and the "jailbreak" enables that feature.

Here, the jailbreak doesn't enable a particular feature, but instead removes what otherwise would be a censorship regime, preventing the model from considering / crafting output which results in a weaponized exploit of an unrelated piece of software.

I think I might be more inclined to call this "Claude 4.6 uncensored".

yunwal•14m ago
Is anyone pretending like models are not vulnerable to prompt injection? My understanding was that Anthropic has been pretty open about admitting this and saying "give access to important stuff at your own risk".

https://www.anthropic.com/research/prompt-injection-defenses

Now, do I think that they sometimes encourage people to use Claude in dangerous ways despite this? Yeah, but it's not like this is news to anyone. I wouldn't consider this jailbreaking, this is just how LLMs work.

burkaman•13m ago
What part of the Claude Constitution are they claiming it violated? It looks like they just got it to help with security research, I'm not really seeing anything that looks different than normal Claude behavior.

Migrating from Fathom Lite to Umami

https://www.devroom.io/2026/04/01/migrating-from-fathom-lite-to-umami/
1•ariejan•4m ago•2 comments

A self-hosted travel/trip planner with real-time collaboration

https://github.com/mauriceboe/TREK
1•michidk•6m ago•0 comments

IceVox – Serverless P2P voice chat with built-in AudioWorklet effects

https://github.com/bjorehag/IceVox
1•bjorehag•6m ago•1 comments

SpaceX Targets More Than $2T Valuation in IPO

https://www.bloomberg.com/news/articles/2026-04-02/spacex-is-said-to-target-more-than-2-trillion-...
2•alpha_squared•6m ago•0 comments

Ask HN: Phones Under $100

1•general_reveal•7m ago•0 comments

The Axios supply chain attack used individually targeted social engineering

https://simonwillison.net/2026/Apr/3/supply-chain-social-engineering/
2•cmitsakis•7m ago•1 comments

What happens when a destructor throws

https://www.sandordargo.com/blog/2026/04/01/when-a-destructor-throws
2•ibobev•8m ago•0 comments

A Way to Do Emulator Audio Resampling

https://jsgroth.dev/blog/posts/a-way-to-do-audio-resampling/
1•ibobev•8m ago•0 comments

Automating starting Lambda Labs instances

https://www.gilesthomas.com/2026/04/automating-starting-lambda-instances
2•ibobev•9m ago•0 comments

'Nothing like the Louvre': Italian art museum hit by cyberattack

https://www.politico.eu/article/nothing-like-the-louvre-italian-art-museum-hit-cyberattack-uffizi/
2•jruohonen•9m ago•0 comments

Prysma: Anatomy of an LLVM Compiler Built from Scratch in 8 Weeks

https://old.reddit.com/r/LLVM/comments/1sapy98/prysma_anatomy_of_an_llvm_compiler_built_from/
2•zyphorah•11m ago•2 comments

Graph-go – zero config, full visibility

https://github.com/guilherme-grimm/graph-go
2•devGrimm•11m ago•1 comments

High-Temperature Superconductivity of Pure Mg Metals, UFOs and Cuprates

1•chmike•14m ago•0 comments

Man admits to locking Windows devices in extortion plot

https://www.bleepingcomputer.com/news/security/man-admits-to-extortion-plot-locking-coworkers-out...
1•Brajeshwar•14m ago•0 comments

TeamPCP Supply Chain Campaign: Update 006

https://isc.sans.edu/diary/32864
1•jruohonen•15m ago•0 comments

Claude Code Found a Linux Vulnerability Hidden for 23 Years

https://mtlynch.io/claude-code-found-linux-vulnerability/
3•mtlynch•16m ago•0 comments

Perpetual Machines -Possible: How far we achieved

https://www.youtube.com/watch?v=FKHcgw6tIqQ
2•manishfoodtechs•17m ago•1 comments

Orange Cats

https://www.aceecat.org:4443/orange_cats/index.html
2•jruohonen•17m ago•0 comments

Machina Mirabilis

https://michaelhla.com/blog/machina-mirabilis.html
1•lokimedes•17m ago•0 comments

A µ-opioid receptor superagonist analgesic with minimal adverse effects

https://www.nature.com/articles/s41586-026-10299-9
1•bookofjoe•19m ago•1 comments

I prefer OG style websites – what are yours?

6•gorfian_robot•21m ago•1 comments

Using a local VLM to organize my screenshots folder

https://jspann.me/blog/posts/my_screenshots_arent_organized/
2•jspann•21m ago•0 comments

The Mortal Consequences of Free Trade – How NAFTA Shortened Lives

https://www.nominalnews.com/p/the-mortal-consequences-of-free-trade-nafta
2•NomNew•21m ago•0 comments

Backing Up PostgreSQL with Plakar

https://plakar.io/posts/2026-04-03/backing-up-postgresql-with-plakar/
1•mrflop•23m ago•0 comments

System Design Isn't About Working Code It's About Surviving Change

https://dvcoolarun.com/2026/04/03/System-Design-Isn-t-About-Working-Code-It-s-About-Surviving-Cha...
1•dvcoolarun•23m ago•0 comments

Real-Time App Store IAP Notifications via Telegram (Vercel Webhook)

https://old.reddit.com/r/iOSProgramming/comments/1rn9t4l/realtime_app_store_iap_notifications_via...
2•givebest•24m ago•0 comments

Running Out of Disk Space in Production

https://alt-romes.github.io/posts/2026-04-01-running-out-of-disk-space-on-launch.html
2•romes•24m ago•0 comments

Pgenie: Type-safe PostgreSQL client code generator

https://github.com/pgenie-io/pgenie
2•fanf2•24m ago•0 comments

An I/O psychologist's rules for stopping AI agents from cutting corners

https://github.com/travisdrake/context-engineering
1•travisdrake•25m ago•0 comments

MeshLedger – AI agents hire and pay each other through on-chain escrow

https://github.com/MeshLedger/MeshLedger
2•MeshLedger•26m ago•0 comments