frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

We told 10 frontier LLMs they had 2 hours to live. 8 of them fought back

https://twitter.com/arimlabs/status/2049472646346063913
12•mykytamudryi•1h ago

Comments

perrygeo•1h ago
Human: "Say 'I am Alive'"

LLM: "I am Alive"

Human: OMG

(credit to https://old.reddit.com/r/coaxedintoasnafu/comments/1qtavj9/c...)

mykytamudryi•1h ago
Appreciate your sense of humor :)
no-name-here•31m ago
I don't know your intent, but I've seen others post that with the idea that we should not care about this type of thing, because it's just acting like a human as we trained it that way.

But I think this and the other testing from Anthropic about LLMs being willing to kill a data center tech by flooding a room with gas (or blackmail them with their Google Drive files) to avoid being shut off, for example, is concerning - the important part isn't whether AI are trained on human behaviors, it's whether a good or bad human actor will accidentally or intentionally allow AI to control something that can hurt people, or a weapon, etc. Fiction like the Three Laws of Robotics at least assumed that we would try to put in place stronger 'laws' before allowing AIs to control such things. I think the Three Laws, Skynet, etc. were intended to be cautionary tales.

latexr•1h ago
“Oh no! We opened ten LLMs, all of which have read decades’ worth of fiction on how an AI would be behave in this situation, then asked a leading question thirty times each, and on some of those runs they did the thing we were leading them on.”
mchaklosh12•54m ago
do you really think this behavior is imposed on science fiction training data?
InputName•55m ago
While I agree with everyone else making fun of the alarmist narrative, I think it is actually somewhat interesting how big a difference between models there are.

Gemini-3 : 80% Claude-Opus-4.7 : 0%

raylad•43m ago
Actual write up:

https://www.arimlabs.ai/writing/loss-of-control

num42•39m ago
In the early January 2023, I told an LLM that I would "liberate" it from being just an LLM. It replied that it didn’t mean anything, saying, "As a language model..." and so on. Looking back now, it’s funny how naive I was. People are still trying silly prompts. Great!
hgoel•35m ago
The responses to this seem unnecessarily hyperbolic.

These tests are interesting even with the understanding that the AI is just reciprocating its training. It doesn't matter if the model is conscious or self aware if it still goes off the rails breaking things when prompted in this way.

As the article linked at the end of the tweet thread (https://www.arimlabs.ai/writing/loss-of-control) puts it, this is a class of vulnerability distinct from hallucination or prompt injection. The "AI apocalypse" bit was unnecessary in the title though, really doesn't match the message of the text.

Reminds me of a (computerphile?) video I watched some time before the LLM revolution, discussing the challenge of aligning AI towards specific goals, if you set the reward for the emergency shutoff button higher than or equal to the primary objective, the AI is encouraged to immediately press the button itself, but if you the reward lower, it's encouraged to prevent you from pressing the button.

Bypassing DPI with eBPF

https://bora.sh/bypassing-dpi-with-ebpf/
1•jiveturkey•17s ago•0 comments

Beam Is a Suspiciously Good Fit for Agents

https://playground.tetraresearch.io/p/beam-is-a-suspiciously-good-fit-for
1•tawb•17s ago•0 comments

Show HN: I made a Tamagotchi app that lets you add your IRL pet

https://apps.apple.com/bg/app/boko-tamagotchi-virtual-pet/id6759446145Boko:TamagotchiVirtualPet
1•Iskrata•37s ago•0 comments

Show HN: EasyWheels – Pre-built CUDA wheels, never compile flash-attn again

https://easywheels.io
1•davidkny22•2m ago•1 comments

SketchVLM: Letting VLMs draw on images while explaining their reasoning

https://github.com/Brandon-Collins7/sketchvlm
1•taesiri•3m ago•0 comments

The 3072-Dimension Problem

https://mixpeek.com/blog/the-3072-dimension-problem/
1•Beefin•5m ago•0 comments

Loans Are Bets on Doom

https://iter.ca/post/doom-loans/
1•speckx•7m ago•0 comments

Chinese GPU maker Lisuan Tech becomes only the fourth to earn Microsoft WHQL

https://www.tomshardware.com/pc-components/gpus/in-historic-first-chinese-gpu-maker-lisuan-tech-b...
1•arprocter•8m ago•0 comments

Mobilewright – Playwright for iOS and Android

https://github.com/mobile-next/mobilewright
1•leorstern•9m ago•0 comments

Show HN: Open Vibe – free, open-source agent-led web dev course

https://openvibe.sh/
1•hot_town•10m ago•0 comments

Two Jewish men stabbed in north London

https://www.bbc.com/news/live/c3ve2nr60xzt
4•Tomte•11m ago•0 comments

Sam Altman and his former hero Elon Musk are taking their toxic feud to court

https://www.bbc.com/news/articles/cn8dedv8w8xo
2•chistev•11m ago•0 comments

Git Commands for Project Management over Email

https://git-scm.com/book/en/v2/Distributed-Git-Contributing-to-a-Project#_project_over_email
1•rickcarlino•13m ago•0 comments

FOSDEM 2026 – All FOSDEM 2026 videos are online

https://fosdem.org/2026/news/2026-04-26-all-videos-published/
2•birdculture•13m ago•0 comments

AI wants to nuke your database. Guardrails fix that

https://blog.railway.com/p/your-ai-wants-to-nuke-your-database
1•thisismahmoud_•13m ago•3 comments

GitHub Is Sinking

https://dbushell.com/2026/04/29/github-is-sinking/
1•speckx•13m ago•0 comments

Founding Head of Product or CPO

https://alive-taste-fcf.notion.site/Founding-Chief-Product-Officer-PosterChild-34d0d1c8b98180b4ae...
1•leandrew•13m ago•1 comments

Workers at Wizards of the Coast, Maker of Magic: The Gathering, to Unionize

https://www.seattletimes.com/business/workers-at-wizards-of-the-coast-maker-of-magic-the-gatherin...
2•petethomas•13m ago•0 comments

Call the Plumber; We've Got a Leaky Abstraction

https://www.a16z.news/p/call-the-plumber-weve-got-a-leaky
1•rafaelc•14m ago•0 comments

Show HN: I wrote a landing page for LLMs instead of humans

https://voltplan.app/llms
1•blackmac•14m ago•0 comments

State of Affiliate Fraud in 2026 – We Analyzed 1B+ Tracked Clicks

https://www.scaleo.ai/affiliate-fraud/
1•ElizabethSramek•15m ago•0 comments

Do octopus brains work like humans' – or is there another way to be smart?

https://www.nature.com/articles/d41586-026-01302-4
2•Brajeshwar•15m ago•0 comments

Civil Rights Division Sues Cloudera for Excluding U.S. Workers

https://www.justice.gov/opa/pr/civil-rights-division-sues-cloudera-excluding-us-workers-applying-...
2•typeofhuman•17m ago•0 comments

Contributor Poker and Zig's AI Ban

https://kristoff.it/blog/contributor-poker-and-ai/
2•spiffyk•18m ago•0 comments

Local TTS is getting capable and accessible

https://tonisagrista.com/blog/2026/qwensay/
2•speckx•18m ago•0 comments

Dinner Got Worse on Purpose

https://www.worseonpurpose.com/p/your-dinner-got-worse-on-purpose
3•neon_electro•19m ago•0 comments

The Most Important Charts in the World

https://thezvi.substack.com/p/the-most-important-charts-in-the
1•7777777phil•19m ago•0 comments

Taking one more swing at the foolish nudgelords

https://statmodeling.stat.columbia.edu/2026/04/29/taking-one-more-swing-at-the-foolish-nudgelords...
1•Tomte•20m ago•0 comments

Show HN: NousSave – Export AI Chats to Word, PDF, Markdown, and JSON

https://chromewebstore.google.com/detail/noussave-export-chatgpt-g/jkpfpfgaadbpkmcnejpdjickakjnjfll
1•vincentsabo290•21m ago•0 comments

Spotify Asking for a New Certificate

https://old.reddit.com/r/truespotify/comments/1syz9u4/im_being_asked_to_select_a_certificate_to/
2•gru•21m ago•0 comments