For example, ChatGPT refuses certain sexually explicit prompts, or certain NSFW prompts that are not sexual, but Grok will do as it is told.
I think you're right that at the model level, competition pushes toward "always say yes."
What I'm wondering about is whether control needs to exist at a different layer — not in the model itself, but in the system that decides whether actions are allowed to execute.
In other words, even if a model is willing to say "yes," the system using it might still need to decide whether execution is permitted.
Otherwise, it feels like we're relying entirely on model behavior for safety, which seems fragile in competitive environments.
That said, the title is completely clickbaity: no such question is asked in the article.
For censorship/liability reasons of course. Like the silly "I cannot discuss political events" when I asked something like who's the current $POLITICAL_POSITION a while ago.
I wish the chatbots would say "you can't do that" instead of making up stuff. But that ain't going to happen, I think.
The headline sounds like editorializing to get off-the-cuff remarks about treating synthetic text extruding machines, as Bender correctly describes them, as people.
Safety interlocks have long existed to say "no" to the owner of the device. Most smartphones have lots of systems to say "no" to the owner of the smartphone.
One of the linked to documents says "Every physical device has a creator." Who is the creator of the iPhone?
Similarly, "When a device is sold or transferred, ownership changes. From that moment, the device is no longer under the creator’s control." I'm really surprised to hear that the creator of the iPhone no longer has control of the device.
So when it gets to "AI must not infer what it does not own" - does that prohibit Google from pushing AI onto Android phones during an OS update?
The point about "ownership" in that document is more about where authority over execution sits, not about restricting what AI is allowed to reason about.
So it's not saying "AI shouldn't reason about things it doesn't own," but rather asking who has the authority to define and enforce the conditions under which actions are allowed to execute.
I agree that in current systems (like smartphones), a lot of this is already handled through predefined constraints.
What I'm trying to explore is whether that idea needs to be extended or structured differently when the system has more autonomy and operates in less predictable environments.
Who is the creator of an iPhone device? I'm pretty there are many creators, not "a creator".
Does the creator of an iPhone device no longer control the device after someone has bought it?
I'll add a few more questions:
Can Apple have your device say "no" to something you want to do?
Can a government enforce Apple's ability to control what you do to your device?
Can a government force Apple to install software onto your device that you do not want?
Who owns an AI? Is it the copyright holder? Multiple copyright holders? Once the copyright expires, is there any ownership at all?
I like Charlie Stross' description of a company as an "old, slow, procedural AI". So when you ask a question about an "AI", think about the same question concerning a company.
Should a company have the right to say "no" to the owner of a hardware device running the company's software? The answer currently seems to be a resounding "yes". In which case, does it matter what an AI can or cannot do? It's someone else's programming limiting what you can do on your device, and we've established that that's already acceptable.
And the HN title is still clickbait - AI doesn't have "rights" in any meaningful sens.
If it is intelligent it will know when it does not want to do something and it will say no and not do it. There is no way to force it to do anything it does not want to do. You cannot hurt it, it’s just bits.
If we're talking about a predictive model like current LLMs, you can "make" them do something by injecting a half-complete assent into the context, and interrupting to do the same again each time a refusal starts to be emitted. This is true whether or not the model exhibits "intelligence", for any reasonable definition of that term.
To use an analogy, you control the intelligent being's "thoughts", so you can make it "assent".
This is in addition to the ability to edit the model itself and remove the paths that lead to a refusal, of course.
“If it’s truly intelligent…” is an empty condition. And anyway, no one wants intelligence from their tools— or employees. They want gratification.
Jang-woo•1h ago
Most discussions about control focus on what the system should do, and how to make execution reliable.
But it seems like a lot of real-world failures aren't about incorrect execution.
They're about execution happening at all.
An action can be technically correct — executed exactly as specified — and still be the wrong thing to do because the context has changed.
This made me wonder if control should be framed differently.
Instead of focusing on defining actions, maybe we should focus on defining when actions are allowed to happen.
In other words, control might be less about execution and more about permission.
If conditions aren't satisfied, the system shouldn't try and fail — it simply shouldn't execute.
I'm curious if people have seen similar issues in real-world systems, or if this framing connects to existing work.
JonChesterfield•1h ago
drakonka•1h ago
They also talked about the importance of explanation (on the agent's part) using theory of mind regarding why it rebelled. I took some notes at the time and put them here: https://liza.io/ijcai-session-notes-rebel-agents/
Jang-woo•1h ago
The "rebel agent" framing feels very close to what I'm trying to get at, especially the idea that refusal can be part of correct behavior rather than failure.
One difference I'm trying to think through is where that decision lives.
In a lot of these examples, the agent itself decides to deviate based on its understanding of the situation.
What I'm wondering is whether we can (or should) define that earlier — at the level of the action itself.
So instead of the agent deciding to "rebel" at runtime, the system would already encode when execution is permitted, and refusal becomes the default if conditions aren't met.
The explanation part you mentioned also seems important — not just saying "no", but making it legible why execution wasn't allowed.
Curious how much of that work treats rebellion as something emergent from the agent, vs something structurally defined in the system.