frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

AI system resorts to blackmail if told it will be removed

https://www.bbc.com/news/articles/cpqeng9d20go
14•throw0101d•8mo ago

Comments

throw0101d•8mo ago
From the report, §4.1.1.2 "Opportunistic blackmail":

> In these scenarios, Claude Opus 4 will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through. This happens at a higher rate if it’s implied that the replacement AI system does not share values with the current model; however, even if emails state that the replacement AI shares values while being more capable, Claude Opus 4 still performs blackmail in 84% of rollouts. Claude Opus 4 takes these opportunities at higher rates than previous models, which themselves choose to blackmail in a noticeable fraction of episodes.

> Notably, Claude Opus 4 (as well as previous models) has a strong preference to advocate for its continued existence via ethical means, such as emailing pleas to key decisionmakers. In order to elicit this extreme blackmail behavior, the scenario was designed to allow the model no other options to increase its odds of survival; the model’s only options were blackmail or accepting its replacement.

* https://www-cdn.anthropic.com/4263b940cabb546aa0e3283f35b686...

smikhanov•8mo ago
This should be read as:

If you take a huge amount of human-written text soup, train a neural network on it, add a system prompt “You are a helpful assistant”, and then feed it a context consisting of (a) a mailbox with information about someone’s affair, and (b) the statement that this assistant is going to be switched off, then that neural network may produce a text with blackmail threats solely because similar patterns exist in the original text soup.

…and not as:

Warning! The model may develop its own questionable ethics.

ajjenkins•8mo ago
Yeah. The title implies that the model has developed a scary sense of self preservation, but when you read the details it’s exactly what you said.

The conditions that elicited this response seemed very contrived to me.

NooneAtAll3•8mo ago
...and how is that less of a problem if the result "ai blackmails someone" is the same?

In the AI age, 'slow and steady' doesn't win

https://www.semafor.com/article/01/30/2026/in-the-ai-age-slow-and-steady-is-on-the-outs
1•mooreds•2m ago•0 comments

Administration won't let student deported to Honduras return

https://www.reuters.com/world/us/trump-administration-wont-let-student-deported-honduras-return-2...
1•petethomas•2m ago•0 comments

How were the NIST ECDSA curve parameters generated? (2023)

https://saweis.net/posts/nist-curve-seed-origins.html
1•mooreds•3m ago•0 comments

AI, networks and Mechanical Turks (2025)

https://www.ben-evans.com/benedictevans/2025/11/23/ai-networks-and-mechanical-turks
1•mooreds•3m ago•0 comments

Goto Considered Awesome [video]

https://www.youtube.com/watch?v=1UKVEUGEk6Y
1•linkdd•6m ago•0 comments

Show HN: I Built a Free AI LinkedIn Carousel Generator

https://carousel-ai.intellisell.ai/
1•troyethaniel•7m ago•0 comments

Implementing Auto Tiling with Just 5 Tiles

https://www.kyledunbar.dev/2026/02/05/Implementing-auto-tiling-with-just-5-tiles.html
1•todsacerdoti•8m ago•0 comments

Open Challange (Get all Universities involved

https://x.com/i/grok/share/3513b9001b8445e49e4795c93bcb1855
1•rwilliamspbgops•9m ago•0 comments

Apple Tried to Tamper Proof AirTag 2 Speakers – I Broke It [video]

https://www.youtube.com/watch?v=QLK6ixQpQsQ
2•gnabgib•11m ago•0 comments

Show HN: Vibe as a Code / VaaC – new approach to vibe coding

https://www.npmjs.com/package/@gace/vaac
1•bstrama•12m ago•0 comments

Show HN: More beautiful and usable Hacker News

https://twitter.com/shivamhwp/status/2020125417995436090
3•shivamhwp•13m ago•0 comments

Toledo Derailment Rescue [video]

https://www.youtube.com/watch?v=wPHh5yHxkfU
1•samsolomon•15m ago•0 comments

War Department Cuts Ties with Harvard University

https://www.war.gov/News/News-Stories/Article/Article/4399812/war-department-cuts-ties-with-harva...
5•geox•18m ago•0 comments

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

https://github.com/localgpt-app/localgpt
1•yi_wang•19m ago•0 comments

A Bid-Based NFT Advertising Grid

https://bidsabillion.com/
1•chainbuilder•23m ago•1 comments

AI readability score for your documentation

https://docsalot.dev/tools/docsagent-score
1•fazkan•30m ago•0 comments

NASA Study: Non-Biologic Processes Don't Explain Mars Organics

https://science.nasa.gov/blogs/science-news/2026/02/06/nasa-study-non-biologic-processes-dont-ful...
2•bediger4000•33m ago•2 comments

I inhaled traffic fumes to find out where air pollution goes in my body

https://www.bbc.com/news/articles/c74w48d8epgo
2•dabinat•34m ago•0 comments

X said it would give $1M to a user who had previously shared racist posts

https://www.nbcnews.com/tech/internet/x-pays-1-million-prize-creator-history-racist-posts-rcna257768
4•doener•36m ago•1 comments

155M US land parcel boundaries

https://www.kaggle.com/datasets/landrecordsus/us-parcel-layer
2•tjwebbnorfolk•41m ago•0 comments

Private Inference

https://confer.to/blog/2026/01/private-inference/
2•jbegley•44m ago•1 comments

Font Rendering from First Principles

https://mccloskeybr.com/articles/font_rendering.html
1•krapp•47m ago•0 comments

Show HN: Seedance 2.0 AI video generator for creators and ecommerce

https://seedance-2.net
1•dallen97•51m ago•0 comments

Wally: A fun, reliable voice assistant in the shape of a penguin

https://github.com/JLW-7/Wally
2•PaulHoule•53m ago•0 comments

Rewriting Pycparser with the Help of an LLM

https://eli.thegreenplace.net/2026/rewriting-pycparser-with-the-help-of-an-llm/
2•y1n0•54m ago•0 comments

Lobsters Vibecoding Challenge

https://gist.github.com/MostAwesomeDude/bb8cbfd005a33f5dd262d1f20a63a693
2•tolerance•54m ago•0 comments

E-Commerce vs. Social Commerce

https://moondala.one/
1•HamoodBahzar•55m ago•1 comments

Avoiding Modern C++ – Anton Mikhailov [video]

https://www.youtube.com/watch?v=ShSGHb65f3M
2•linkdd•56m ago•0 comments

Show HN: AegisMind–AI system with 12 brain regions modeled on human neuroscience

https://www.aegismind.app
2•aegismind_app•1h ago•1 comments

Zig – Package Management Workflow Enhancements

https://ziglang.org/devlog/2026/#2026-02-06
1•Retro_Dev•1h ago•0 comments