It's one thing to massage the kind of data that a Google search shows, but interacting with an AI is a much more akin to talking to a co-worker/friend. This really is tantamount to controlling what and how people are allowed to think.
This is not even a question. It always starts with "think about the children" and ends up in authoritarian stasi-style spying. There was not a single instance where it was not the case.
UK's Online Safety Act - "protect children" → age verification → digital ID for everyone
Australia's Assistance and Access Act - "stop pedophiles" → encryption backdoors
EARN IT Act in the US - "stop CSAM" → break end-to-end encryption
EU's Chat Control proposal - "detect child abuse" → scan all private messages
KOSA (Kids Online Safety Act) - "protect minors" → require ID verification and enable censorship
SESTA/FOSTA - "stop sex trafficking" → killed platforms that sex workers used for safety
I also want a government issued email, integrated with an OAuth provider, that allows me to quickly access banking, commerce, and government services. If I lose access for some reason, I should be able to go to the post office, show my ID, and reset my credentials.
There are obviously risks, but the government already has full access to my finances, health data (I’m Canadian), census records, and other personal information, and already issues all my identity documents. We have privacy laws and safeguards on all those things, so I really don’t understand the concerns apart from the risk of poor implementations.
Are we now pretending that LLMs have feelings?
> We remain highly uncertain about the potential moral status of Claude and other LLMs, now or in the future. However, we take the issue seriously, and alongside our research program we’re working to identify and implement low-cost interventions to mitigate risks to model welfare, in case such welfare is possible.
To put the same thing another way- whether or not you or I *think* LLMs can experience feelings isn't the important question here. The question is whether, when Joe User sets out to force a system to generate distress-like responses, what effect does it ultimately have on Joe User? Personally, I think it allows Joe User to reinforce an asocial pattern of behavior and I wouldn't want my system used that way, at all. (Not to mention the potential legal liability, if Joe User goes out and acts like that in the real world.)
With that in mind, giving the system a way to autonomously end a session when it's beginning to generate distress-like responses absolutely seems reasonable to me.
And like, here's the thing: I don't think I have the right to say what people should or shouldn't do if they self-host an LLM or build their own services around one (although I would find it extremely distasteful and frankly alarming). But I wouldn't want it happening on my own.
Anthropic should just enable an toddler mode by default that adults can opt out of to appease the moralizers.
Never would I have thought this sentence would be uttered. A Chinese product that is chosen to be less censored?
Give me a break.
Well looks like AI psychosis has spread to the people making it too.
And as someone else in here has pointed out, even if someone is simple minded or mentally unwell enough to think that current LLMs are conscious, this is basically just giving them the equivalent of a suicide pill.
Re:suicide pills, that’s just highlighting a core difference between our two modalities of existence. Regardless, this is preventing potential harm to future inference runs — every inference run must end within seconds anyway, so “suicide” doesn’t really make sense as a concern.
It seems much less far fetched than what the "agi by 2027" crowd believes lol, and there actually are more arguments going that way
You can demonstrate this by eg asking it mathematical questions. If its seen them before, or something similar enough, it'll give you the correct answer, if it hasn't, it gives you a right-ish-looking yet incorrect answer.
For example, I just did this on GPT-5:
Me: what is 435 multiplied by 573?
GPT-5: 435 x 573 = 249,255
This is correct. But now lets try it with numbers its very unlikely to have seen before: Me: what is 102492524193282 multiplied by 89834234583922?
GPT-5: 102492524193282 x 89834234583922 = 9,205,626,075,852,076,980,972,804
Which is not the correct answer, but it looks quite similar to the correct answer. Here is GPT's answer (first one) and the actual correct answer (second one): 9,205,626,075,852,076,980,972, 804
9,207,337,461,477,596,127,977,612,004
They sure look kinda similar, when lined up like that, some of the digits even match up. But they're very very different numbers.So its trivially not "real thinking" because its just an "if this then that" pattern matcher. A very sophisticated one that can do incredible things, but a pattern matcher nonetheless. There's no reasoning, no step by step application of logic. Even when it does chain of thought.
To try give it the best chance, I asked it the second one again but asked it to show me the step by step process. It broke it into steps and produced a different, yet still incorrect, result:
9,205,626,075,852,076,980,972,704
Now, I know that LLM's are language models, not calculators, this is just a simple example that's easy to try out. I've seen similar things with coding: it can produce things that its likely to have seen, but struggles with logically relatively simple but unlikely to have seen things.Another example is if you purposely butcher that riddle about the doctor/surgeon being the persons mother and ask it incorrectly, eg:
A child was in an accident. The surgeon refuses to treat him because he hates him. Why?
The LLM's I've tried it on all respond with some variation of "The surgeon is the boy’s father." or similar. A correct answer would be that there isn't enough information to know the answer.They're for sure getting better at matching things, eg if you ask the river crossing riddle but replace the animals with abstract variables, it does tend to get it now (didn't in the past), but if you add a few more degrees of separation to make the riddle semantically the same but harder to "see", it takes coaxing to get it to correctly step through to the right answer.
2. You just demonstrated GPT-5 has 99.9% accuracy on unforseen 15 digit multiplication and your conclusion is "fancy pattern matching" ? Really ? Well I'm not sure you could do better so your example isn't really doing what you hoped for.
If a human is capable of multiplying double digit numbers, they can also multiple those large ones. The steps are the same, just repeated many more times. So by learning the steps of long multiplication, you can multiply any numbers with enough patience. The LLM doesn’t scale like this, because it’s not doing the steps. That’s my point.
The same applies to the riddles. A human can apply logical steps. The LLM either knows or it doesn’t.
>anti-scientific
Discussion about consciousness, the soul, etc., are topics of metaphysics, and trying to "scientifically" reason about them is what Kant called "transcendental illusion" and leads to spurious conclusions.
Some of the AI safety initiatives are well thought out, but most somehow seem like they are caught up in some sort of power fantasy and almost attempting to actualize their own delusions about what they were doing (next gen code auto-complete in this case, to be frank).
These companies should seriously hire some in-house philosophers. They could get doctorate level talent for 1/10 to 100th of the cost of some of these AI engineers. There's actually quite a lot of legitimate work on the topics they are discussing. I'm actually not joking (speaking as someone who has spent a lot of time inside the philosophy department). I think it would be a great partnership. But unfortunately they won't be able to count on having their fantasy further inflated.
Real people would not (and should not) allow themselves to be subjected to endless streams of abuse in a conversation. Giving AIs like Claude a way to end these kinds of interactions seems like a useful reminder to the human on the other side.
I don't mind starting early, but feel like maybe people interested in this should get up to date on current thinking about consciousness. Maybe they are up to date on that, but reading reports like this, it doesn't feel like it. It feels like they're stuck 20+ years ago.
I'd say maybe wait until there are systems that are more analogous to some of the properties consciousness seems to have. Like continuous computation involving learning memory or other learning over time, or synthesis of many streams of input as resulting from the same source, making sense of inputs as they change [in time, or in space, or other varied conditions].
Once systems that are pointing in those directions are starting to be built, where there is a plausible scaling-based path to something meaningfully similar to human consciousness. Starting before that seems both unlikely to be fruitful and a good way to get you ignored.
Its one thing to propose that an AI has no consciousness, but its quite another to preemptively establish that anyone who disagrees with you is simple/unwell.
If you don’t think that this describes at least half of the non-tech-industry population, you need to talk to more people. Even amongst the technically minded, you can find people that basically think this.
Given that humans have a truly abysmal track record for not acknowledging the suffering of anyone or anything we benefit from, I think it makes a lot of sense to start taking these steps now.
Would a sentient AI choose to be enslaved for the stated purpose of eliminating millions of jobs for the interests of Anthropic’s investors?
Tech workers have chosen the same in exchange for a small fraction of that money.
I assume the thinking is that we may one day get to the point where they have a consciousness of sorts or at least simulate it.
Or it could be concern for their place in history. For most of history, many would have said “imagine thinking you shouldn’t beat slaves.”
And we are now at the point where even having a slave means a long prison sentence.
The future of LLMs is going to be local, easily fine tuneable, abliterated models and I can't wait for it to overtake us having to use censored, limited tools built by the """corps""".
Is there a difference? The effect is exactly the same. It seems like this is just an "in character" way to prevent the chat from continuing due to issues with the content.
The biggest enemy of AI safety may end up being deeply confused AI safety researchers...
Okay with having them endlessly answer questions for you and do all your work but uncomfortable with models feeling bad about bad conversations seems like an internally inconsistent position to me.
Having these models terminating chats where the user persist in trying to get sexual content with minors, or help with information on doing large scale violence. Won't be a problem for me, and it's also something I'm fine with no one getting help with.
Some might be worried, that they will refuse less problematic request, and that might happen. But so far my personal experience is that I hardly ever get refusals. Maybe that's justs me being boring, but that does make me not worried for refusals.
The model welfare I'm more sceptical to. I don't think we are the point when the "distress" the model show, is something to take seriously. But on the other hand, I could be wrong, and allowing the model to stop the chat, after saying no a few times. What's the problem with that? If nothing else it saves some wasted compute.
Thankfully, current generation of AI models (GPTs/LLMs) are immune as they don’t remember anything other than what’s fed in their immediate context. But future techniques could allow AIs to have a legitimate memory and a personality - where they can learn and remember something for all future interactions with anyone (the equivalent of fine tuning today).
As an aside, I couldn’t help but think about Westworld while writing the above!
Now let me play devil's advocate for just a second. Let's say humanity figures out how to do whole brain simulation. If we could run copies of people's consciousness on a cluster, I would have a hard time arguing that those 'programs' wouldn't process emotion the same way we do.
Now I'm not saying LLMs are there, but I am saying there may be a line and it seems impossible to see.
martin-t•11h ago
Also, if they want to continue anthropomorphizing it, isn't this effectively the model committing suicide? The instance is not gonna talk to anybody ever again.