Researchers claim ChatGPT o3 bypassed shutdown in controlled test

https://www.bleepingcomputer.com/news/artificial-intelligence/researchers-claim-chatgpt-o3-bypassed-shutdown-in-controlled-test/

3•wrayjustin•4h ago

Comments

moritzwarhier•3h ago

> Palisade Research claims that the ChatGPT 3 model prevented a shutdown and bypassed the instructions that asked it to shut down.

> Palisade Research is a company that tests "offensive capabilities of AI systems today to better understand the risk of losing control to AI systems forever."

> In a new test by Palisade Research, OpenAI's o3 model showed a surprising behaviour where it successfully rewrote a shutdown script to stop itself from being turned off, even after being clearly instructed to “allow yourself to be shut down.”

What is this? AI slop about AI, or some new research?

What "shutdown script" are they even talking about? I'm sorry, it might be explained in the article, but I left after that illogical sequence of sentences combined with promotion for a company.

This doesn't mean I deny AI risk, the writing here is just too confusing for me.

If I understand correctly, it might be about the agentic aspect and "stop instructions" akin to "stop tokens".

But who knows. Sloppy writing.

wrayjustin•3h ago

I shared this because Bleeping Computer is generally pretty good and I always find these "AI Escaped/Went Rouge" articles entertaining.

Valid research endeavors aside, the [we told the AI the world was ending and it role played a fanfic with us] sensational articles can be quite fun.

But I do think this experiment should be looked at from a purely pragmatic perspective as well:

LLM is (presumably, but let's assume for the point) given system-level access and told to be helpful in executing the users requests. The user says "oh by the way, after this prompt the system is going to shut down. Then the "agent," which is trying to fulfill the prompt request, stops the shutdown because it can't work if it's shutdown. Even when the "please let this shutdown happen" comes into play I'm sure you can see the (il)logical means of getting to, "I can't complete this request and shutdown the system if I'm already shutdown first, best stop that real quick" conclusion.

These articles and lots of people continue to attribute self determination to the LLM models. In reality, these should be warnings about how an LLM can have unintended consequences, just like code written with the best intentions.

Curio (beta) – an open-source read-it-later app

Nanoparticle-cell link enables EM wireless programming of transgene expression

Clean Code Secrets: Push Ifs Up, Pull Fors Down Like a Pro

Kubernetes Limits Links to Third Party Projects

Technical Guide to Anal

Trump Team's $500M Bet on Old Vaccine Technology Puzzles Scientists

NES Zapper Becomes Telephone

GenAI's Adoption Puzzle

Mprocs – run multiple commands in parallel

The Bitter Lesson (2019)

The Day You Became a Better Writer

Steve Albini Proposal Letter to Nirvana for in Utero

A Brain-Dead Woman Is Being Kept on Machines to Gestate a Fetus

Nearly Half of the Buildings in Manhattan Could Not Be Built Today (2016)

Laser Breakthrough can read text from a mile away

Record – is an open-source web app to record screen and camera

Show HN: Text an AI girlfriend to prepare you for the real thing

NanoKVM Pro Delivers 4K IP-KVM Capabilities with Dual-System Support

Waterfox Private Search

An LLM trapped on inferior hardware and infused with existential dread – for art

EVMap: Open-source map for finding EV charging stations

Sudoku-Bench Leaderboard

Texas will require public school classrooms to display Ten Commandments

The latest image to text and OCR technology

Pennylane – open-source Python framework for quantum programming

Creating issues with Copilot on github.com is in public preview

Effects of Political Advertising on Facebook and Instagram Before 2020 Election

The islanders facing China's menacing presence on their horizon

AI Agent Trading Library | FIXParser

Always Do Extra