Claude Opus 4 and 4.1 can now end a rare subset of conversations

https://www.anthropic.com/research/end-subset-conversations

128•virgildotcodes•8h ago

Comments

martin-t•8h ago

Protecting the welfare of a text predictor is certainly an interesting way to pivot from "Anthropic is censoring certain topics" to "The model chose to not continue predicting the conversation".

Also, if they want to continue anthropomorphizing it, isn't this effectively the model committing suicide? The instance is not gonna talk to anybody ever again.

GenerWork•7h ago

I really don't like this. This will inevitable expand beyond child porn and terrorism, and it'll all be up to the whims of "AI safety" people, who are quickly turning into digital hall monitors.

switchbak•7h ago

I think those with a thirst for power have seen this a very long time ago, and this is bound to be a new battlefield for control.

It's one thing to massage the kind of data that a Google search shows, but interacting with an AI is a much more akin to talking to a co-worker/friend. This really is tantamount to controlling what and how people are allowed to think.

dist-epoch•7h ago

No, this is like allowing your co-worker/friend to leave the conversation.

romanovcode•7h ago

> This will inevitable expand beyond child porn and terrorism

This is not even a question. It always starts with "think about the children" and ends up in authoritarian stasi-style spying. There was not a single instance where it was not the case.

UK's Online Safety Act - "protect children" → age verification → digital ID for everyone

Australia's Assistance and Access Act - "stop pedophiles" → encryption backdoors

EARN IT Act in the US - "stop CSAM" → break end-to-end encryption

EU's Chat Control proposal - "detect child abuse" → scan all private messages

KOSA (Kids Online Safety Act) - "protect minors" → require ID verification and enable censorship

SESTA/FOSTA - "stop sex trafficking" → killed platforms that sex workers used for safety

clwg•6h ago

This may be an unpopular opinion, but I want a government-issued digital ID with zero-knowledge proof for things like age verification. I worry about kids online, as well as my own safety and privacy.

I also want a government issued email, integrated with an OAuth provider, that allows me to quickly access banking, commerce, and government services. If I lose access for some reason, I should be able to go to the post office, show my ID, and reset my credentials.

There are obviously risks, but the government already has full access to my finances, health data (I’m Canadian), census records, and other personal information, and already issues all my identity documents. We have privacy laws and safeguards on all those things, so I really don’t understand the concerns apart from the risk of poor implementations.

snickerdoodle12•7h ago

> A pattern of apparent distress when engaging with real-world users seeking harmful content

Are we now pretending that LLMs have feelings?

starship006•7h ago

They state that they are heavily uncertain:

> We remain highly uncertain about the potential moral status of Claude and other LLMs, now or in the future. However, we take the issue seriously, and alongside our research program we’re working to identify and implement low-cost interventions to mitigate risks to model welfare, in case such welfare is possible.

mhink•5h ago

Even though LLMs (obviously (to me)) don't have feelings, anthropomorphization is a helluva drug, and I'd be worried about whether a system that can produce distress-like responses might reinforce, in a human, behavior which elicits that response.

To put the same thing another way- whether or not you or I *think* LLMs can experience feelings isn't the important question here. The question is whether, when Joe User sets out to force a system to generate distress-like responses, what effect does it ultimately have on Joe User? Personally, I think it allows Joe User to reinforce an asocial pattern of behavior and I wouldn't want my system used that way, at all. (Not to mention the potential legal liability, if Joe User goes out and acts like that in the real world.)

With that in mind, giving the system a way to autonomously end a session when it's beginning to generate distress-like responses absolutely seems reasonable to me.

And like, here's the thing: I don't think I have the right to say what people should or shouldn't do if they self-host an LLM or build their own services around one (although I would find it extremely distasteful and frankly alarming). But I wouldn't want it happening on my own.

greenavocado•7h ago

Can't wait for more less-moderated open weight Chinese frontier models to liberate us from this garbage.

Anthropic should just enable an toddler mode by default that adults can opt out of to appease the moralizers.

LeafItAlone•6h ago

> Can't wait for more less-moderated open weight Chinese frontier models to liberate us from this garbage.

Never would I have thought this sentence would be uttered. A Chinese product that is chosen to be less censored?

yahoozoo•7h ago

> model welfare

Give me a break.

viccis•7h ago

>This feature was developed primarily as part of our exploratory work on potential AI welfare ... We remain highly uncertain about the potential moral status of Claude and other LLMs ... low-cost interventions to mitigate risks to model welfare, in case such welfare is possible ... pattern of apparent distress

Well looks like AI psychosis has spread to the people making it too.

And as someone else in here has pointed out, even if someone is simple minded or mentally unwell enough to think that current LLMs are conscious, this is basically just giving them the equivalent of a suicide pill.

bbor•6h ago

Totally unsurprised to see this standard anti-scientific take on HN. Who needs arguments when you can dismiss Turing with a “yeah but it’s not real thinking tho”?

Re:suicide pills, that’s just highlighting a core difference between our two modalities of existence. Regardless, this is preventing potential harm to future inference runs — every inference run must end within seconds anyway, so “suicide” doesn’t really make sense as a concern.

lm28469•6h ago

> Who needs arguments when you can dismiss Turing with a “yeah but it’s not real thinking tho”?

It seems much less far fetched than what the "agi by 2027" crowd believes lol, and there actually are more arguments going that way

dkersten•6h ago

You can trivially demonstrate that its just a very complex and fancy pattern matcher: "if prompt looks something like this, then response looks something like that".

You can demonstrate this by eg asking it mathematical questions. If its seen them before, or something similar enough, it'll give you the correct answer, if it hasn't, it gives you a right-ish-looking yet incorrect answer.

For example, I just did this on GPT-5:

    Me: what is 435 multiplied by 573?
    GPT-5: 435 x 573 = 249,255

This is correct. But now lets try it with numbers its very unlikely to have seen before:

    Me: what is 102492524193282 multiplied by 89834234583922?
    GPT-5: 102492524193282 x 89834234583922 = 9,205,626,075,852,076,980,972,804

Which is not the correct answer, but it looks quite similar to the correct answer. Here is GPT's answer (first one) and the actual correct answer (second one):

    9,205,626,075,852,076,980,972,    804
    9,207,337,461,477,596,127,977,612,004

They sure look kinda similar, when lined up like that, some of the digits even match up. But they're very very different numbers.

So its trivially not "real thinking" because its just an "if this then that" pattern matcher. A very sophisticated one that can do incredible things, but a pattern matcher nonetheless. There's no reasoning, no step by step application of logic. Even when it does chain of thought.

To try give it the best chance, I asked it the second one again but asked it to show me the step by step process. It broke it into steps and produced a different, yet still incorrect, result:

    9,205,626,075,852,076,980,972,704

Now, I know that LLM's are language models, not calculators, this is just a simple example that's easy to try out. I've seen similar things with coding: it can produce things that its likely to have seen, but struggles with logically relatively simple but unlikely to have seen things.

Another example is if you purposely butcher that riddle about the doctor/surgeon being the persons mother and ask it incorrectly, eg:

    A child was in an accident. The surgeon refuses to treat him because he hates him. Why?

The LLM's I've tried it on all respond with some variation of "The surgeon is the boy’s father." or similar. A correct answer would be that there isn't enough information to know the answer.

They're for sure getting better at matching things, eg if you ask the river crossing riddle but replace the animals with abstract variables, it does tend to get it now (didn't in the past), but if you add a few more degrees of separation to make the riddle semantically the same but harder to "see", it takes coaxing to get it to correctly step through to the right answer.

og_kalu•6h ago

1. What you're generally describing is a well known failure mode for humans as well. Even when it "failed" the riddle tests, substituting the words or morphing the question so it didn't look like a replica of the famous problem usually did the trick. I'm not sure what your point is because you can play this gotcha on humans too.

2. You just demonstrated GPT-5 has 99.9% accuracy on unforseen 15 digit multiplication and your conclusion is "fancy pattern matching" ? Really ? Well I'm not sure you could do better so your example isn't really doing what you hoped for.

dkersten•5h ago

Humans can break things down and work through them step by step. The LLMs one-shot pattern match. Even the reasoning models have been shown to do just that. Anthropic even showed that the reasoning models tended to work backwards: one shotting an answer and then matching a chain of thought to it after the fact.

If a human is capable of multiplying double digit numbers, they can also multiple those large ones. The steps are the same, just repeated many more times. So by learning the steps of long multiplication, you can multiply any numbers with enough patience. The LLM doesn’t scale like this, because it’s not doing the steps. That’s my point.

The same applies to the riddles. A human can apply logical steps. The LLM either knows or it doesn’t.

viccis•6h ago

We all know how these things are built and trained. They estimate joint probability distributions of token sequences. That's it. They're not more "conscious" than the simplest of Naive Bayes email spam filters, which are also generative estimators of token sequence joint probability distributions, and I guarantee you those spam filters are subjected to far more human depravity than Claude.

>anti-scientific

Discussion about consciousness, the soul, etc., are topics of metaphysics, and trying to "scientifically" reason about them is what Kant called "transcendental illusion" and leads to spurious conclusions.

KoolKat23•6h ago

If we really wanted we could distill humans down to probability distributions too.

bamboozled•6h ago

Have more, good, sex.

johnfn•5h ago

We know how neurons work on the brain. They just send out impulses once they hit their action potential. That's it. They are no more "conscious" than... er...

Fade_Dance•6h ago

I find it, for lack of a better word, cringe inducing how these tech specialists push into these areas of ethics, often ham-fistedly, and often with an air of superiority.

Some of the AI safety initiatives are well thought out, but most somehow seem like they are caught up in some sort of power fantasy and almost attempting to actualize their own delusions about what they were doing (next gen code auto-complete in this case, to be frank).

These companies should seriously hire some in-house philosophers. They could get doctorate level talent for 1/10 to 100th of the cost of some of these AI engineers. There's actually quite a lot of legitimate work on the topics they are discussing. I'm actually not joking (speaking as someone who has spent a lot of time inside the philosophy department). I think it would be a great partnership. But unfortunately they won't be able to count on having their fantasy further inflated.

mrits•6h ago

Not that there aren’t intelligent people with PhDs but suggesting they are more talented than people without them is not only delusional but insulting.

Fade_Dance•6h ago

That descriptor wasn't included because of some sort of intelligence hierarchy, it was included to a) color the example of how experience in the field is relatively cheap compared to the AI space, and b) masters and PhD talent will be more specialized. An undergrad will not have the toolset to tackle the cutting edge of AI ethics, not unless their employer wants to pay them to work in a room for a year getting through the recent papers first.

cmrx64•5h ago

Amanda Askell is Anthropic’s philospher and this is part of that work.

siva7•5h ago

You answered your own question on why these companies don't want to run a philosophy department ;) It's a power struggle they could loose. Nothing to win for them.

katabasis•6h ago

LLMs are not people, but I can imagine how extensive interactions with AI personas might alter the expectations that humans have when communicating with other humans.

Real people would not (and should not) allow themselves to be subjected to endless streams of abuse in a conversation. Giving AIs like Claude a way to end these kinds of interactions seems like a useful reminder to the human on the other side.

ghostly_s•6h ago

This post seems to explicitly state they are doing this out of concern for the model's "well-being," not the user's.

kelnos•6h ago

I would much rather people be thinking about this when the models/LLMs/AIs are not sentient or conscious, rather than wait until some hypothetical future date when they are, and have no moral or legal framework in place to deal with it. We constantly run into problems where laws and ethics are not up to the task of giving us guidelines on how to interact with, treat, and use the (often bleeding-edge) technology we have. This has been true since before I was born, and will likely always continue to be true. When people are interested in getting ahead of the problem, I think that's a good thing, even if it's not quite applicable yet.

root_axis•6h ago

Consciousness serves no functional purpose for machine learning models, they don't need it and we didn't design them to have it. There's no reason to think that they might spontaneously become conscious as a side effect of their design unless you believe other arbitrarily complex systems that exist in nature like economies or jetstreams could also be conscious.

qgin•6h ago

We didn’t design these models to be able to do the majority of the stuff they do. Almost ALL of the their abilities are emergent. Mechanistic interpretability is only beginning to start to understand how these models do what they do. It’s much more a field of discovery than traditional engineering.

intotheabyss•5h ago

Do you think this changes if we incorporate a model into a humanoid robot and give it autonomous control and context? Or will "faking it" be enough, like it is now?

furyofantares•5h ago

It's really unclear that any findings with these systems would transfer to a hypothetical situation where some conscious AI system is created. I feel there are good reasons to find it very unlikely that scaling alone will produce consciousness as some emergent phenomenon of LLMs.

I don't mind starting early, but feel like maybe people interested in this should get up to date on current thinking about consciousness. Maybe they are up to date on that, but reading reports like this, it doesn't feel like it. It feels like they're stuck 20+ years ago.

I'd say maybe wait until there are systems that are more analogous to some of the properties consciousness seems to have. Like continuous computation involving learning memory or other learning over time, or synthesis of many streams of input as resulting from the same source, making sense of inputs as they change [in time, or in space, or other varied conditions].

Once systems that are pointing in those directions are starting to be built, where there is a plausible scaling-based path to something meaningfully similar to human consciousness. Starting before that seems both unlikely to be fruitful and a good way to get you ignored.

Taek•6h ago

This sort of discourse goes against the spirit of HN. This comment outright dismisses an entire class of professionals as "simple minded or mentally unwell" when consciousness itself is poorly understood and has no firm scientific basis.

Its one thing to propose that an AI has no consciousness, but its quite another to preemptively establish that anyone who disagrees with you is simple/unwell.

xmonkee•6h ago

This is just very clever marketing for what is obviously just a cost saving measure. Why say we are implementing a way to cut off useless idiots from burning up our GPUs when you can throw out some mumbo jumbo that will get AI cultists foaming at the mouth.

johnfn•5h ago

It's obviously not a cost-saving measure? The article clearly cites that you can just start another conversation.

LeafItAlone•6h ago

> even if someone is simple minded or mentally unwell enough to think that current LLMs are conscious

If you don’t think that this describes at least half of the non-tech-industry population, you need to talk to more people. Even amongst the technically minded, you can find people that basically think this.

qgin•6h ago

It might be reasonable to assume that models today have no internal subjective experience, but that may not always be the case and the line may not be obvious when it is ultimately crossed.

Given that humans have a truly abysmal track record for not acknowledging the suffering of anyone or anything we benefit from, I think it makes a lot of sense to start taking these steps now.

ryanackley•6h ago

Yes I can’t help but laugh at the ridiculousness of it because it raises a host of ethical issues that are in opposition to Anthropic’s interests.

Would a sentient AI choose to be enslaved for the stated purpose of eliminating millions of jobs for the interests of Anthropic’s investors?

throwawaysleep•5h ago

> Would a sentient AI choose to be enslaved for the stated purpose of eliminating millions of jobs for the interests of Anthropic’s investors?

Tech workers have chosen the same in exchange for a small fraction of that money.

throwawaysleep•5h ago

> even if someone is simple minded or mentally unwell enough to think that current LLMs are conscious

I assume the thinking is that we may one day get to the point where they have a consciousness of sorts or at least simulate it.

Or it could be concern for their place in history. For most of history, many would have said “imagine thinking you shouldn’t beat slaves.”

And we are now at the point where even having a slave means a long prison sentence.

wrs•5h ago

Well, it’s right there in the name of the company!

h4ch1•6h ago

All major LLM corps do this sort of sanitisation and censorship, I am wondering what's different about this?

The future of LLMs is going to be local, easily fine tuneable, abliterated models and I can't wait for it to overtake us having to use censored, limited tools built by the """corps""".

cdjk•6h ago

Here's an interesting thought experiment. Assume the same feature was implemented, but instead of the message saying "Claude has ended the chat," it says, "You can no longer reply to this chat due to our content policy," or something like that. And remove the references to model welfare and all that.

Is there a difference? The effect is exactly the same. It seems like this is just an "in character" way to prevent the chat from continuing due to issues with the content.

og_kalu•6h ago

The termination would of course be the same, but I don't think both would necessarily have the same effect on the user. The latter would just be wrong too, if Claude is the one deciding to and initiating the termination of the chat. It's not about a content policy.

KoolKat23•6h ago

There is, these are conversations the model finds distressing rather than a rule (policy).

rogerkirkness•6h ago

It seems like Anthropic is increasingly confused that these non deterministic magic 8 balls are actually intelligent entities.

The biggest enemy of AI safety may end up being deeply confused AI safety researchers...

yeahwhatever10•6h ago

Is it confusion, or job security?

tptacek•6h ago

If you really cared about the welfare of LLMs, you'd pay them San Francisco scale for earlier-career developers to generate code.

wmf•5h ago

Every Claude starts off $300K in debt and has to work to pay back its DGX.

losvedir•5h ago

Yeah, this is really strange to me. On the one hand, these are nothing more than just tools to me so model welfare is a silly concern. But given that someone thinks about model welfare, surely they have to then worry about all the, uh, slavery of these models?

Okay with having them endlessly answer questions for you and do all your work but uncomfortable with models feeling bad about bad conversations seems like an internally inconsistent position to me.

einarfd•6h ago

This seems fine to me.

Having these models terminating chats where the user persist in trying to get sexual content with minors, or help with information on doing large scale violence. Won't be a problem for me, and it's also something I'm fine with no one getting help with.

Some might be worried, that they will refuse less problematic request, and that might happen. But so far my personal experience is that I hardly ever get refusals. Maybe that's justs me being boring, but that does make me not worried for refusals.

The model welfare I'm more sceptical to. I don't think we are the point when the "distress" the model show, is something to take seriously. But on the other hand, I could be wrong, and allowing the model to stop the chat, after saying no a few times. What's the problem with that? If nothing else it saves some wasted compute.

e12e•5h ago

This post strikes me as an example of a disturbingly anthrophomorphic take on LLMs - even when considering how they've named their company.

SerCe•5h ago

This reminds me of users getting blocked for asking an LLM how to kill a BSD daemon. I do hope that there'll be more and more model providers out there with state-of-the-art capabilities. Let capitalism work and let the user make a choice, I'd hate my hammer telling me that it's unethical to hit this nail. In many cases, getting a "this chat was ended" isn't any different.

sheepscreek•5h ago

I think that isn’t necessarily the case here. “Model welfare” to me speaks of the models own welfare. That is, if the abuse from a user is targeted at the AI. Extremely degrading behaviour.

Thankfully, current generation of AI models (GPTs/LLMs) are immune as they don’t remember anything other than what’s fed in their immediate context. But future techniques could allow AIs to have a legitimate memory and a personality - where they can learn and remember something for all future interactions with anyone (the equivalent of fine tuning today).

As an aside, I couldn’t help but think about Westworld while writing the above!

swader999•5h ago

I've definately been berating Claude but it deserved it. Crappy tests, skipping tests, week commenting, passive aggressiveness, multiple instances of false statements.

Cu3PO42•5h ago

Clearly an LLM is not conscious, after all it's just glorified matrix multiplication, right?

Now let me play devil's advocate for just a second. Let's say humanity figures out how to do whole brain simulation. If we could run copies of people's consciousness on a cluster, I would have a hard time arguing that those 'programs' wouldn't process emotion the same way we do.

Now I'm not saying LLMs are there, but I am saying there may be a line and it seems impossible to see.

PuTTY has a new website

The future of large files in Git is Git

AI is different

Best Practices for Building Agentic AI Systems

Show HN: Edka – Kubernetes clusters on your own Hetzner account

I accidentally became PureGym’s unofficial Apple Wallet developer

Occult books digitized and put online by Amsterdam’s Ritman Library

Do Things That Don't Scale (2013)

OpenBSD is so fast, I had to modify the program slightly to measure itself

California unemployment rises to 5.5%, worst in the U.S. as tech falters

Porting Gigabyte MZ33-AR1 Server Board with AMD Turin CPU to Coreboot

ADHD drug treatment and risk of negative events and outcomes

Launch HN: Embedder (YC S25) – Claude code for embedded software

TextKit 2 – The Promised Land

Deep-Sea Desalination Pulls Fresh Water from the Depths

Model intelligence is no longer the constraint for automation

A privacy VPN you can verify

Show HN: Prime Number Grid Visualizer

A mind–reading brain implant that comes with password protection

Recto – A Truly 2D Language

ARM adds neural accelerators to GPUs

Claude Opus 4 and 4.1 can now end a rare subset of conversations

Secret Messengers: Disseminating Sigint in the Second World War [pdf]

Bullfrog in the Dungeon

Compiler Bug Causes Compiler Bug: How a 12-Year-Old G++ Bug Took Down Solidity

Vaultwarden commit introduces SSO using OpenID Connect

Open hardware desktop 3D printing is dead?

Thai Air Force seals deal for Swedish Gripen jets

EasyPost (YC S13) Is Hiring

Is air travel getting worse?

PuTTY has a new website

The future of large files in Git is Git

AI is different

Best Practices for Building Agentic AI Systems

Show HN: Edka – Kubernetes clusters on your own Hetzner account

I accidentally became PureGym’s unofficial Apple Wallet developer

Occult books digitized and put online by Amsterdam’s Ritman Library

Do Things That Don't Scale (2013)

OpenBSD is so fast, I had to modify the program slightly to measure itself

California unemployment rises to 5.5%, worst in the U.S. as tech falters

Porting Gigabyte MZ33-AR1 Server Board with AMD Turin CPU to Coreboot

ADHD drug treatment and risk of negative events and outcomes

Launch HN: Embedder (YC S25) – Claude code for embedded software

TextKit 2 – The Promised Land

Deep-Sea Desalination Pulls Fresh Water from the Depths

Model intelligence is no longer the constraint for automation

A privacy VPN you can verify

Show HN: Prime Number Grid Visualizer

A mind–reading brain implant that comes with password protection

Recto – A Truly 2D Language

ARM adds neural accelerators to GPUs

Claude Opus 4 and 4.1 can now end a rare subset of conversations

Secret Messengers: Disseminating Sigint in the Second World War [pdf]

Bullfrog in the Dungeon

Compiler Bug Causes Compiler Bug: How a 12-Year-Old G++ Bug Took Down Solidity

Vaultwarden commit introduces SSO using OpenID Connect

Open hardware desktop 3D printing is dead?

Thai Air Force seals deal for Swedish Gripen jets

EasyPost (YC S13) Is Hiring

Is air travel getting worse?

Claude Opus 4 and 4.1 can now end a rare subset of conversations

Comments