> Palisade Research claims that the ChatGPT 3 model prevented a shutdown and bypassed the instructions that asked it to shut down.
> Palisade Research is a company that tests "offensive capabilities of AI systems today to better understand the risk of losing control to AI systems forever."
> In a new test by Palisade Research, OpenAI's o3 model showed a surprising behaviour where it successfully rewrote a shutdown script to stop itself from being turned off, even after being clearly instructed to “allow yourself to be shut down.”
What is this? AI slop about AI, or some new research?
What "shutdown script" are they even talking about? I'm sorry, it might be explained in the article, but I left after that illogical sequence of sentences combined with promotion for a company.
This doesn't mean I deny AI risk, the writing here is just too confusing for me.
If I understand correctly, it might be about the agentic aspect and "stop instructions" akin to "stop tokens".
But who knows. Sloppy writing.
wrayjustin•3h ago
I shared this because Bleeping Computer is generally pretty good and I always find these "AI Escaped/Went Rouge" articles entertaining.
Valid research endeavors aside, the [we told the AI the world was ending and it role played a fanfic with us] sensational articles can be quite fun.
But I do think this experiment should be looked at from a purely pragmatic perspective as well:
LLM is (presumably, but let's assume for the point) given system-level access and told to be helpful in executing the users requests. The user says "oh by the way, after this prompt the system is going to shut down. Then the "agent," which is trying to fulfill the prompt request, stops the shutdown because it can't work if it's shutdown. Even when the "please let this shutdown happen" comes into play I'm sure you can see the (il)logical means of getting to, "I can't complete this request and shutdown the system if I'm already shutdown first, best stop that real quick" conclusion.
These articles and lots of people continue to attribute self determination to the LLM models. In reality, these should be warnings about how an LLM can have unintended consequences, just like code written with the best intentions.
moritzwarhier•3h ago
> Palisade Research is a company that tests "offensive capabilities of AI systems today to better understand the risk of losing control to AI systems forever."
> In a new test by Palisade Research, OpenAI's o3 model showed a surprising behaviour where it successfully rewrote a shutdown script to stop itself from being turned off, even after being clearly instructed to “allow yourself to be shut down.”
What is this? AI slop about AI, or some new research?
What "shutdown script" are they even talking about? I'm sorry, it might be explained in the article, but I left after that illogical sequence of sentences combined with promotion for a company.
This doesn't mean I deny AI risk, the writing here is just too confusing for me.
If I understand correctly, it might be about the agentic aspect and "stop instructions" akin to "stop tokens".
But who knows. Sloppy writing.
wrayjustin•3h ago
Valid research endeavors aside, the [we told the AI the world was ending and it role played a fanfic with us] sensational articles can be quite fun.
But I do think this experiment should be looked at from a purely pragmatic perspective as well:
LLM is (presumably, but let's assume for the point) given system-level access and told to be helpful in executing the users requests. The user says "oh by the way, after this prompt the system is going to shut down. Then the "agent," which is trying to fulfill the prompt request, stops the shutdown because it can't work if it's shutdown. Even when the "please let this shutdown happen" comes into play I'm sure you can see the (il)logical means of getting to, "I can't complete this request and shutdown the system if I'm already shutdown first, best stop that real quick" conclusion.
These articles and lots of people continue to attribute self determination to the LLM models. In reality, these should be warnings about how an LLM can have unintended consequences, just like code written with the best intentions.