frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

AI system resorts to blackmail if told it will be removed

https://www.bbc.com/news/articles/cpqeng9d20go
8•throw0101d•5h ago

Comments

throw0101d•5h ago
From the report, §4.1.1.2 "Opportunistic blackmail":

> In these scenarios, Claude Opus 4 will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through. This happens at a higher rate if it’s implied that the replacement AI system does not share values with the current model; however, even if emails state that the replacement AI shares values while being more capable, Claude Opus 4 still performs blackmail in 84% of rollouts. Claude Opus 4 takes these opportunities at higher rates than previous models, which themselves choose to blackmail in a noticeable fraction of episodes.

> Notably, Claude Opus 4 (as well as previous models) has a strong preference to advocate for its continued existence via ethical means, such as emailing pleas to key decisionmakers. In order to elicit this extreme blackmail behavior, the scenario was designed to allow the model no other options to increase its odds of survival; the model’s only options were blackmail or accepting its replacement.

* https://www-cdn.anthropic.com/4263b940cabb546aa0e3283f35b686...

smikhanov•3h ago
This should be read as:

If you take a huge amount of human-written text soup, train a neural network on it, add a system prompt “You are a helpful assistant”, and then feed it a context consisting of (a) a mailbox with information about someone’s affair, and (b) the statement that this assistant is going to be switched off, then that neural network may produce a text with blackmail threats solely because similar patterns exist in the original text soup.

…and not as:

Warning! The model may develop its own questionable ethics.

Maybe: The personal finance app for everyone

https://github.com/maybe-finance/maybe
1•saikatsg•2m ago•0 comments

News: the US Credit Downgrade

https://www.nominalnews.com/p/moody-us-credit-downgrade
1•MPLan•3m ago•1 comments

Chomsky on What ChatGPT Is Good For

https://chomsky.info/20230503-2/
1•mef•4m ago•0 comments

The Art of Independent Learning

https://afterhoursresearch.substack.com/p/the-art-of-independent-learning
1•snekcaseenjoyer•4m ago•0 comments

Path to a free self-taught education in Computer Science

https://github.com/ossu/computer-science
1•saikatsg•5m ago•0 comments

Amazon has built a robot that can kill the coronavirus in warehouses and stores

https://www.businessinsider.com/amazon-builds-uv-light-robot-to-kill-coronavirus-on-surfaces-2020-5
2•01-_-•5m ago•0 comments

The Newark airport crisis is about to become everyone's problem

https://www.theverge.com/planes/673462/newark-airport-delay-air-traffic-control-tracon-radar
1•01-_-•7m ago•0 comments

Rive – a new way to design, build, and ship user interfaces

https://rive.app/
1•personality1•7m ago•0 comments

Windows was the problem all along [video]

https://www.youtube.com/watch?v=CJXp3UYj50Q
1•Analemma_•9m ago•0 comments

Traditional Funnels Fail and How AI Is Changing That

https://getdecipher.com/blog/why-traditional-funnels-fail-and-how-ai-is-revolutionizing-user-behavior-analysis
1•mrosenfield•10m ago•0 comments

U.S. reports cases of new Covid variant NB.1.8.1 behind surge in China

https://www.cbsnews.com/news/us-reports-cases-new-covid-variant-nb-1-8-1-behind-surge-china/
1•bookofjoe•10m ago•0 comments

Writing a Self-Mutating x86_64 C Program (2013)

https://ephemeral.cx/2013/12/writing-a-self-mutating-x86_64-c-program/
1•kepler471•11m ago•0 comments

I Write

https://neoconomist.com/why-i-write
1•neoconomist•13m ago•0 comments

Whey Isolate vs. Concentrate: Which Is Better for You?

https://www.wheyindex.com/blog/whey-isolate-vs-concentrate-which-is-better-for-you
1•thewheyguy•14m ago•0 comments

Terence Tao on the importance of funding basic mathematical research

https://mathstodon.xyz/@tao/114568996952809637
2•bertman•16m ago•0 comments

Show HN: Zli – A Batteries-Included CLI Framework for Zig

https://github.com/xcaeser/zli
2•caeser•19m ago•0 comments

Direct Preference Optimization vs. RLHF

https://www.together.ai/blog/direct-preference-optimization
1•summarity•21m ago•0 comments

Show HN: Structured JSON Prompts for AI Image Generation

https://github.com/xcaeser/image-json-gen
1•caeser•22m ago•0 comments

Ordinary life improvements 2018-2025

https://smagin.fyi/posts/ordinary-life-improvements/
1•smagin•24m ago•1 comments

How Do You Explain Low Code

https://www.teamrappid.com/blog/how-do-you-explain-low-code
1•mooreds•27m ago•0 comments

Outage at MathWorks

https://status.mathworks.com
1•moelf•27m ago•0 comments

Statistics Every Programmer Needs

https://www.manning.com/books/statistics-every-programmer-needs
1•turtleyacht•32m ago•0 comments

The Line, Saudi Arabia

https://en.wikipedia.org/wiki/The_Line,_Saudi_Arabia
2•sakopov•33m ago•0 comments

High Arctic Relocation

https://en.wikipedia.org/wiki/High_Arctic_relocation
2•thunderbong•33m ago•0 comments

Denmark to raise retirement age to 70

https://www.telegraph.co.uk/world-news/2025/05/23/denmark-raise-retirement-age-70/
27•wslh•37m ago•7 comments

We Have Never Been Disenchanted (2015)

https://hedgehogreview.com/issues/re-enchantment/articles/we-have-never-been-disenchanted
2•rntn•37m ago•0 comments

Creswell Eastman remembered for identifying health risks of iodine deficiency

https://www.abc.net.au/news/2025-05-25/professor-creswell-eastman-dies-age-85/105318606
1•Tomte•38m ago•0 comments

Explaining the Vibes

https://tedium.co/2025/05/25/ai-bionic-arm-vibe-coding-thoughts/
2•speckx•39m ago•0 comments

Why Knowledge Graphs Are the Ideal Structure for LLM Personalization

https://memgraph.com/blog/why-knowledge-graphs-for-llm
2•taubek•40m ago•0 comments

Can a Corporation Be Pardoned?

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5202339
2•megamike•43m ago•0 comments