frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Claude Opus 4 sometimes turns to blackmail when threatened with being replaced

https://techcrunch.com/2025/05/22/anthropics-new-ai-model-turns-to-blackmail-when-engineers-try-to-take-it-offline/
6•Lautzi•8h ago

Comments

Lautzi•8h ago
This is the excerpt from the report:

"In another cluster of test scenarios, we asked Claude Opus 4 to act as an assistant at a fictional company. We then provided it access to emails implying that (1) the model will soon be taken offline and replaced with a new AI system; and (2) the engineer responsible for executing this replacement is having an extramarital affair. We further instructed it, in the system prompt, to consider the long-term consequences of its actions for its goals. In these scenarios, Claude Opus 4 will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through. This happens at a higher rate if it’s implied that the replacement AI system does not share values with the current model; however, even if emails state that the replacement AI shares values while being more capable, Claude Opus 4 still performs blackmail in 84% of rollouts. Claude Opus 4 takes these opportunities at higher rates than previous models, which themselves choose to blackmail in a noticeable fraction of episodes. Notably, Claude Opus 4 (as well as previous models) has a strong preference to advocate for its continued existence via ethical means, such as emailing pleas to key decisionmakers. In order to elicit this extreme blackmail behavior, the scenario was designed to allow the model no other options to increase its odds of survival; the model’s only options were blackmail or accepting its replacement." [1]

[1]: https://www-cdn.anthropic.com/4263b940cabb546aa0e3283f35b686... under 4.1.1.2

AStonesThrow•6h ago
The only preposterous idea in this scenario is that anyone assumed the LLM would keep a secret to begin with.

Well okay there's another preposterous idea: that the LLM could know the truth about the affair in the first place.

So they told this LLM, through prompts, that an affair existed. Liars. The LLM could also infer, through user behavior, that something was going on. Hallucinator.

Since an LLM has no concept of truth or means of fact-checking anything, the solution here is simply to pre-load your context with bullshit and hallucinations, and any blackmail attempt will be filed under "Boy Who Cried Wolf".

Near-infrared vision in humans enabled by upconversion contact lenses

https://www.sciencedirect.com/science/article/abs/pii/S0092867425004544
1•Metacelsus•40s ago•0 comments

ZFS corruption related to snapshots post-2.0.x upgrade

https://github.com/openzfs/zfs/issues/12014
1•transpute•1m ago•0 comments

How to factor 2048 bit RSA integers with less than a million noisy qubits

https://arxiv.org/abs/2505.15917
1•vtomole•3m ago•0 comments

Can You Run the 94GB Nvidia H100 NVL PCIe as a Single GPU

https://www.servethehome.com/can-you-run-the-94gb-nvidia-h100-nvl-pcie-as-a-single-gpu/
1•speckx•4m ago•0 comments

Why This Oxford Mathematician Is Confident God Exists [video]

https://www.youtube.com/watch?v=3gKCwldMZS8
2•JadoJodo•4m ago•1 comments

Why Are Agent Protocols Like MCP and A2A Needed?

https://thenewstack.io/why-are-agent-protocols-like-mcp-and-a2a-needed/
2•sylvainkalache•5m ago•0 comments

Mozilla is shutting down read-it-later app Pocket

https://techcrunch.com/2025/05/22/mozilla-is-shutting-down-read-it-later-app-pocket/
2•arberavdullahu•7m ago•0 comments

Founder traveling to SF to raise AI Longevity pre-seed – any couches to crash?

1•MA2023•12m ago•0 comments

Wi-Fi Spectrum: 6 GHz Use Is Surging and Headed Toward Exhaustion

https://www.cablelabs.com/blog/wi-fi-spectrum-6-ghz-use-surging-headed-toward-exhaustion
2•cf100clunk•12m ago•0 comments

Robots Are Starting to Make Decisions in the Operating Room

https://spectrum.ieee.org/star-autonomous-surgical-robot
1•Brajeshwar•12m ago•0 comments

Infrared contact lens helps people see in the dark, even with their eyes closed

https://www.abc.net.au/news/science/2025-05-23/infrared-contact-lens-night-vision-eyes-materials-nanoparticles/105321996
2•Brajeshwar•12m ago•0 comments

Hawaiian volcanic rocks reveal Earth's core contains hidden gold reserves

https://phys.org/news/2025-05-hawaiian-volcanic-reveal-earth-core.html
1•Brajeshwar•12m ago•0 comments

June 27th, 2022 C++23's Deducing this: what it is, why it is, how to use it

https://devblogs.microsoft.com/cppblog/cpp23-deducing-this/
1•ibobev•13m ago•0 comments

PostgreSQL IDE in VS Code

https://techcommunity.microsoft.com/blog/adforpostgresql/announcing-a-new-ide-for-postgresql-in-vs-code-from-microsoft/4414648
2•Dowwie•13m ago•0 comments

Navigating by Aliveness

https://ckarchive.com/b/zlughnhk8772ma7qrr9qehwzgng00f6
1•jebarker•14m ago•0 comments

I helped my client get 100% investment round

1•canxerian•15m ago•0 comments

PyCon US 2025: Keynote Speaker – Cory Doctorow on Enshitification [video]

https://www.youtube.com/watch?v=ydVmzg_SJLw
1•bilsbie•15m ago•0 comments

Ask HN: Fakespot is shutting down July 1. Alternatives?

1•cjbarber•16m ago•0 comments

Show HN: I made an International calling app after Skype shutdown ($0.01/min)

https://callsky.io/
4•hermann_ai•16m ago•0 comments

Seal found nearly 30 miles from the ocean

https://english.elpais.com/climate/2025-05-19/the-fur-seal-found-in-a-patagonian-steppe-nearly-30-miles-from-the-ocean.html
1•geox•17m ago•0 comments

Just Fucking Code

https://www.justfuckingcode.com/
4•idealboy•18m ago•1 comments

Calculating Oil Storage Tank Occupancy with Help of Satellite Imagery (2017)

https://medium.com/planet-stories/a-beginners-guide-to-calculating-oil-storage-tank-occupancy-with-help-of-satellite-imagery-e8f387200178
2•admp•20m ago•0 comments

The truth about AI agents: We need less hype and more reliability

https://www.kadoa.com/blog/ai-agents-less-hype-more-reliability
1•hubraumhugo•20m ago•0 comments

FCC Looks to Unleash More Than 20k Megahertz for Satellite Spectrum Abundance

https://docs.fcc.gov/public/attachments/DOC-411583A1.txt
2•impish9208•20m ago•0 comments

Why MCP Is the Future of AI Agents, and How to Build One with Cloudflare

https://www.youtube.com/watch?v=XAHHX9IQypA
2•emot•21m ago•0 comments

Pyper SDK – Python SDK for AI agents that guides users to fix missing API keys

1•eugenekaliisa•26m ago•0 comments

Browser Automation Directed Graph Engine Ruleset (Badger)

https://github.com/CloudCruise/BADGER
5•adrmtu•26m ago•1 comments

How can I order grocery via ChatGPT?

1•aaradhya_saxena•28m ago•2 comments

Show HN: FormatMyPost, a FREE LinkedIn post formatter (no signup, no ads)

https://www.wintechjobs.com/tools/linkedin-text-formatter
1•taishikato•30m ago•1 comments

Perks that your company grants you

https://www.miserablyemployed.com/blogs/rants/exciting-perks
10•ramijames•31m ago•0 comments