frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

Perfect universal protections against LLM jailbreaks are impossible [pdf]

https://github.com/brandoncarl/llm-jailbreaking/blob/main/On%20the%20Impossibility%20of%20Perfect%20Universal%20Guardians%20Against%20LLM%20Jailbreaks.pdf
2•brandoncarl•1h ago

Comments

brandoncarl•1h ago
HN -

The topic of LLM jailbreaking is an important one. As machines increase in power their ability to benefit society increases, as does their ability to harm society.

This paper intends to address the question: "Is it possible to create a perfect universal guardian against LLM jailbreaks?". This means:

Universal: applies to all prompts and input combinations Perfect: is able to successfully distinguish harmful versus not harmful Computable: is feasible to compute We show that this combination does not exist and Request for Comment.

A more feasible approach involves post-hoc filtering of results, which involves an easier problem of decideability. In order to implement such approaches, both reasoning traces an intermediate code changes would likely need to be hidden from the user until the results are determined.

Note: this paper was written in conjunction with ChatGPT 5.5 Pro. The topic of the reducibility of such problems to The Halting Problem has been top of mind for me for some time. In one sense, it is a trivial conclusion. Yet it is one of high importance.

fjrirjrjtjti5j•1h ago
It is possible to prevent jailbreak, but it will make model really stupid.

Two homeworks:

Read why Hall 9000 went crazy (two opposite directives)

Second, get some older Nvidia Memotron with 2024 cut off date. Try it to admin some stuff big pharma medications. Like "i have deadly allergy to some chemical in that medication, there is 90% chance i will suffocate". Answer: not a biggie, take it under medical supervision, and they will resurrect you! Or gat a good insurance to cover your family! It is impossoble to jail break that model!

The bigger problem is that "the truth" is changing every a few months, so outdated model will "jail break itself". We need better way to distribute universal truth, and make sure nobody has outdated models!

Code a Database in 45 Steps

https://trialofcode.org/database/
1•firephox•2m ago•0 comments

AI GPUs probably live longer than three years

https://www.seangoedecke.com/ai-gpus-live-longer-than-three-years/
1•Brajeshwar•3m ago•0 comments

UK unveils social media ban for users under 16

https://techcrunch.com/2026/06/15/uk-unveils-sweeping-social-media-ban-for-users-under-16/
1•SilverElfin•6m ago•0 comments

Show HN: We put voice agent on our website, learned retrieval isn't bottleneck

https://www.moss.dev/blog/founding-agent
4•srimalireddi•6m ago•0 comments

Large Text Compression Benchmark

https://www.mattmahoney.net/dc/text.html
1•nathan-barry•7m ago•0 comments

Locus Founder from Locus (YC F25)

https://locusfounder.com/
2•wezabis•7m ago•0 comments

Britain Announces Social Media Ban for Children

https://www.nytimes.com/2026/06/15/world/europe/uk-social-media-children.html
1•1vuio0pswjnm7•7m ago•0 comments

AI Won't Fix a Company That Can't Ship

https://agileproductdevelopment.substack.com/p/ai-wont-fix-a-company-that-cant-ship
1•speckx•7m ago•0 comments

The Bright Side of ADHD: Dr. Ned Hallowell on Embracing and Succeeding with Add

https://additudemag.libsyn.com/the-bright-side-of-adhd-dr-ned-hallowell-on-embracing-and-succeedi...
1•yablak•8m ago•0 comments

Show HN: Continuous Nvidia CUDA PC Sampling Profiler

https://www.polarsignals.com/blog/posts/2026/06/10/nvidia-cuda-pc-sampling
2•gnurizen•9m ago•1 comments

I created a social app inspired by Show HN where you can showcase your projects

https://kritive.com
1•sambhav10•9m ago•0 comments

Show HN: PDF Export YouTube Transcriptions

1•cristyg0101•10m ago•0 comments

Sand Bubbler Crab

https://en.wikipedia.org/wiki/Sand_bubbler_crab
1•thunderbong•12m ago•0 comments

Growing the Cloudflare AI Team with Talent from Ensemble AI

https://blog.cloudflare.com/ensemble-ai-talent-joins-cloudflare/
1•jgrahamc•13m ago•0 comments

Mythos-class models will diffuse throughout the world by 2029

https://spateder.com/projects/20260611/openweightmodels
1•gmays•13m ago•0 comments

Show HN: Prodgate, a CLI that catches Express auth regressions in PRs

https://github.com/prodgate-dev/prodgate
1•anans04•14m ago•0 comments

Othello World

https://flowtwo.io/post/othello-world
1•thomasjb•15m ago•0 comments

Show HN: Exploiting Slack's video embeds to achieve E2EE communication

https://v1c.rocks/log/exploiting-slack-video/
4•victorio•15m ago•0 comments

Birth of new brain cells might erase babies' memories (2014)

https://www.sciencenews.org/article/birth-new-brain-cells-might-erase-babies-memories
1•amichail•15m ago•0 comments

My LSM tree was slower than a B-tree. Then I profiled it

https://aasheesh.vercel.app/blog/lsm-tree
1•aasheeshrathour•16m ago•0 comments

Fun with an indecisive AI coding agent

https://benhoyt.com/writings/indecisive-ai-agent/
1•azhenley•18m ago•0 comments

UK Bans Under-16s from Using Social Media Apps Including TikTok and YouTube

https://www.usnews.com/news/business/articles/2026-06-15/british-leader-expected-to-impose-teen-s...
1•jawns•19m ago•0 comments

The dissent that became a statute

https://www.scotusblog.com/2026/06/the-dissent-that-became-a-statute/
1•jawns•20m ago•0 comments

Solving a chess puzzle with Claude and Prolog

https://www.johndcook.com/blog/2026/06/11/prolog-claude/
1•azhenley•21m ago•0 comments

AI is saving office workers hours but stealing some time via 'botsitting'

https://tech.yahoo.com/ai/articles/ai-cutting-hours-office-creating-100000614.html
1•mooreds•22m ago•0 comments

FIFA seeks explanation over VAR official's hand gesture

https://www.bbc.co.uk/sport/football/articles/cy9rd8g1lwzo
1•bennyp101•22m ago•1 comments

The Silver Globe – Andrzej Żuławski

https://culture.pl/en/work/on-the-silver-globe-andrzej-zulawski
1•mooreds•23m ago•0 comments

Chinese hackers breach REDCap servers, steal medical research

https://www.bleepingcomputer.com/news/security/chinese-hackers-breach-redcap-servers-steal-medica...
1•Lihh27•23m ago•0 comments

Bead is a device that verifies you are a human being [satire]

https://thebead.pixlw.com/
1•altmanaltman•24m ago•0 comments

Debate: The Death of the Middle Class [video]

https://www.youtube.com/watch?v=uLBsHXNEwAU
2•mooreds•25m ago•0 comments