Worth pointing out such systems have survived a long long time since access to it is free irrespective of the quality.
It will probably take a few years for the general public to fully appreciate what that means.
Then perhaps "responsiveness", even if misinterpreted as attention. In a similar way to the responsiveness of a casino slot-machine.
I think you are very optimistic if you think the general public will ever fully understand what it means
As these get more sophisticated, the general public will be less and less capable of navigating these new tools in a healthy and balanced fashion
These are all CRUCIAL data points that trained professionals also take cues from. An AI can also be trained on these but I don't think we're close to that yet AFAIK as an outsider.
People in need of therapy could (and probably are) unreliable narrators and a therapist's job is to manage long range context and specialist training to manage that.
I was gonna say: Wait until LLMs start vectorizing to sentiment, inflection and other "non content" information, and matching that to labeled points, somehow ...
... if they ain't already.-
This reminds me of the story of how McDonald's abandoned automated drive thru voice input because in the wild there was too many uncontrolled variables but speech recognition has been a "solved problem" for a long time now...
EDIT I recently had issues trying to biometrically verify my face for a service and after 20-30 failed attempts to get my face recognised I was locked out of the service so sensor-related services are a still a bit of a murky world
No, no it isn't.
Whatever you think about the role of pastor (or any other therapy-related profession), they are humans which possess intrinsic aptitudes a statistical text (token) generator simply does not have.
And an LLM may be trained on malevolent data of which a human is unaware.
> The question is not if they are equals, the question is if their differences matter to the endeavour of therapy.
I did not pose the question of equality and apologize if the following was ambiguous in any way:
... they are humans which possess intrinsic aptitudes
a statistical text (token) generator simply does not have.
Let me now clarify - "silicon" does not have capabilities humans have relevant to successfully performing therapy. Specifically, LLM's are not an equal to human therapists excluding the pathological cases identified above.I think you're wrong, but that isn't really my point. A well-trained LLM that lacks any malevolent data, may well be better than a human psychopath who happens to have therapy credentials. And it may also be better than nothing at all for someone who is unable to reach a human therapist for one reason or another.
For today, I'll agree with you, that the best human therapists that exist today, are better than the best silicon therapists that exist today. But I don't think that situation will persist any longer than such differences persisted in chess playing capabilities. Where for years I heard many people making the same mistake you're making, of saying that silicon could never demonstrate the flair and creativity of human chess players; that turned out to be false. It's simply human hubris to believe we possess capabilities that are impossible to duplicate in silicon.
The scale needed to produce an LLM that is fluent enough to be convincing precludes fine-grained filtering of input data. The usual methods of controlling an LLM essentially involve a broad-brush "don't say stuff like that" (RLHF) that inherently misses a lot of subtlties.
And even more, defining malevolent data is extremely difficult. Therapists often go along with things a patient say because otherwise they break rapport. But therapists have to balk once the patient dives into destructive delusions. But data of a therapy can't be easily labeled with "here's where you have to stop", just to name one problem.
A simple good search reveals ... this very thread as a primary source on the topic of "malevolent data" (ha, ha). But it should be noted that all other sources mentioning the phrase define it as data intentionally modified to produce a bad effect. It seems clear the problems of badly behaved LLMs don't come from this. Sycophancy, notably, doesn't just appear out of "sycophantic data" cleverly inserted by the association of allied sycophants.
In the context of this conversation, it was a response to someone talking about malevolent human therapists, and worried about AIs being trained to do the same things. So that means it's text where one of the participants is acting malevolently in those same ways.
Interesting that in this scenario, the LLM is presented in its assumed general case condition and the human is presented in the pathological one. Furthermore, there already exists an example of an LLM intentionally made (retrained?) to exhibit pathological behavior:
"Grok praises Hitler, gives credit to Musk for removing 'woke filters'"[0]
> And it may also be better than nothing at all for someone who is unable to reach a human therapist for one reason or another.Here is a counterargument to "anything is better than nothing" the article posits:
The New York Times, Futurism, and 404 Media reported cases
of users developing delusions after ChatGPT validated
conspiracy theories, including one man who was told he
should increase his ketamine intake to "escape" a
simulation.
> Where for years I heard many people making the same mistake you're making, of saying that silicon could never demonstrate the flair and creativity of human chess players; that turned out to be false.Chess is a game with specific rules, complex enough to make optimal strategy exhaustive searches infeasible due to exponential cost, yet it exists in a provably correct mathematical domain.
Therapy shares nothing with this other than the time it might take a person to become an expert.
0 - https://arstechnica.com/tech-policy/2025/07/grok-praises-hit...
They were replying to a comment comparing a general case human and a pathological LLM. So yeah, they flipped it around as part of making their point.
This is self-contradictory. An LLM must have malevolent data to identify malevolent intentions. A naive LLM will be useless. Might as well get psychotherapy from a child.
Once LLM has malevolent data, it may produce malevolent output. LLM does not inherently understand what is malevolence. It basically behaves like a psychopath.
You are trying to get a psychopath-like technology to do psychotherapy.
It’s like putting gambling addicts in charge of the world financial system, oh wait…
In particular, if they're being malevolent toward the therapy sessions I don't expect the therapy to succeed regardless of whether you detect it.
A person may be unable to provide mathematical proof and yes be obviously correct.
The totally obvious thing you are missing is that most people will not encourage obviously self-destructive behaviour because they are not psychopaths. And they can get another person to intervene if necessary
Chatbots do not have such concerts
To begin with, not all therapy involves people at risk of harming themselves. Easily over 95% of people who can benefit from therapy are at no more risk of harming themselves than the average person. Were a therapy chatbot to suggest something like it to them, the response will either be amusement or annoyance ("why am I wasting time on this?")
Arguments from extremes (outliers) are the stuff of logical fallacies.
As many keep pointing out, there are plenty of cases of licensed therapists causing harm. Most of the time it is unintentional, but for sure there are those who knowingly abused their position and took advantage of their patients. I'd love to see a study comparing the two ratios to see whether the human therapist or the LLM fare worse.
I think most commenters here need to engage with real therapists more, so they can get a reality check on the field.
I know therapists. I've been to some. I took a course from a seasoned therapist who also was a professor and had trained them. You know the whole replication crisis in psychology? Licensed therapy is no different. There's very little real science backing most of it (even the professor admitted it).
Sure, there are some great therapists out there. The norm is barely better than you or I. Again, no exaggeration.
So if the state of the art improves, and we then have a study showing some LLM therapists are better than the average licensed human one, I for one will not think it a great achievement.
... aren't we commenting on just such a study?
All these threads are full of "yeah but humans are bad too" arguments, as if the nature of interacting with, accountability, motivations or capabilities between LLMs and humans are in any way equivalent.
There are a lot of things LLMs can do, and many they can't. Therapy is one of the things they could do but shouldn't... not yet, and probably not for a long time or ever.
And from my perspective this should be common sense, and not a scientific paper. A LLM will allways be a statistical token auto completer, even if it identifies different. It is pure insanity to put a human with a already harmed psyche in front of this device and trust in the best.
Measure and make decisions based on measurements.
I'm not referring to the study, but to the comments that are trying to make the case.
The study is about the present, using certain therapy bots and custom instructions to generic LLMs. It doesn't do much to answer "Can they work well?"
> All these threads are full of "yeah but humans are bad too" arguments, as if the nature of interacting with, accountability, motivations or capabilities between LLMs and humans are in any way equivalent.
They are correctly pointing out that many licensed therapists are bad, and many patients feel their therapy was harmful.
We know human therapists can be good.
We know human therapists can be bad.
We know LLM therapists can be bad ("OK, so just like humans?")
The remaining question is "Can they be good?" It's too early to tell.
I think it's totally fine to be skeptical. I'm not convinced that LLMs can be effective. But having strong convictions that they cannot is leaping into the territory of faith, not science/reason.
You're falling into a rhetorical trap here by assuming that they can be made better. An equally valid argument that can be made is 'Will they become even worse?'
Believing that they can be good is equally a leap of faith. All current evidence points to them being incredibly harmful.
'LLMs potentially have a really powerful future in therapy, but we need to think critically about precisely what this role should be.'
And they also mention a previous paper that found high levels of engagement from patients.
So, they have potential but currently are giving dangerous advice. It sounds like they are saying a fine tuned therapist model is needed because 'you are a great therapist' prompt, just gives you something that vaguely sounds like a therapist to an outsider.
Sounds like an opportunity honestly.
Would people value a properly trained therapist enough to pay for it over an existing chatgpt subscription?
Mechanical Turk anyone?
Actual therapy requires more unsafe topics than regular talk. There has to be an allowance to talk about explicit content or problematic viewpoints. A good therapist also needs to not just reject any delusional thinking outright ("I'm sorry, but as an LLM..."), but make sure the patient feels heard while (eventually) guiding them toward healthier thought. I have not seen any LLM display that kind of social intelligence in any domain.
One problem is that the advice is dangerous, but there's an entirely different problem, which is the LLM becoming a crutch that the person relies on because it will always tell them what they want to hear.
Most people who call suicide hotlines aren't actually suicidal - they're just lonely or sad and want someone to talk to. The person who answers the phone will talk to them for awhile and validate their feelings, but after a little while they'll politely end the call. The issue is partly that people will monopolize a limited resource, but even if there were an unlimited number of people to answer the phone, it would be fundamentally unhealthy for someone to spend hours a day having someone validate their feelings. It very quickly turns into dependency and it keeps that person in a place where they aren't actually figuring out how to deal with these emotions themselves.
If you choose to believe as Jaron Lanier does that LLMs are a mashup (or as I would characterize it a funhouse mirror) of the human condition, as represented by the Internet, this sort of implicit bias is already represented in most social media. This is further distilled by the cultural practice of hiring third world residents to tag training sets and provide the "reinforcement learning"... people who are effectively if not actually in the thrall of their employers and can't help but reflect their own sycophancy.
As someone who is therefore historically familiar with this process in a wider systemic sense I need (hope for?) something in articles like this which diagnoses / mitigates the underlying process.
I wish I could see hope in the use of LLMs but i don't think the genie goes back into the bottle, the people prone to this kind of delusion will just dig a hole and go deep until they find the willpower or someone on the outside to pull them out. Feels to me like gambling, there's no power that will block gambling apps due to the amount of money they fuel into lobbying so the best we can do is try to help our friends and family and prevent them from being sucked into it.
There were competent kings and competent Empires.
Indeed, it's tough to decide where the Roman Empire really began it's decline. It's not a singular event but a centuries long decline. Same with the Spanish Empire and English Empire.
Indeed, the English Empire may have collapsed but that's mostly because Britain just got bored of it. There's no traditional collapse for the breakup of the British Empire
---------
I can think of some dramatic changes as well. The fall of the Tokugawa Shogunate of Japan wasn't due to incompetence, but instead the culture shock of a full iron battleship from USA visiting Japan when they were still a swords and samurai culture. This broke the Japanese trust in the Samurai system and led to a violent revolution resulting in incredible industrialization. But I don't think the Tokugawa Shogunate was ever considered especially corrupt or incompetent.
---------
Now that being said: Dictators fall into the dictator trap. A bad king who becomes a narcissist and dictator will fall under the pattern you describe. But that doesn't really happen all that often. That's why it's so memorable when it DOES happen
I completely agree with the point you're making, but this part is simply incorrect. The British Empire essentially bankrupted itself during WW2, and much of its empire was made up of money losing territories. This led them to start 'liberating' these territories en masse which essentially signaled the end of the British Empire.
The way Britain has restricted Industry in India (famously even salt) left it vulnerable in WW2.
Colonial policies are really up there with great failures of communists
Artificial intelligence: An unregulated industry built using advice from the internet curated by the cheapest resources we could find.
What can we mitigate your responsibility for this morning?
I've had AI provide answers verbatim from a self-promotion card of the product I was querying as if it was a review of the product. I don't want to chance a therapy bot quoting a single source that, whilst it may be adjacent to the problem needing to be addressed, could be wildly inappropriate or incorrect due to the sensitivities inherent where therapy is required.
(likely different sets of weightings for therapy related content, but I'm not going to be an early adopter for my loved ones - barring everything else failing)
What does "being historically familiar with a process in a wider systemic sense" mean? I'm trying to parse this sentence without success.
The assumption GP is making is that the incentives, values, and biases impressed upon folks providing RL training data may systematically favor responses along a certain vector that is the sum of these influences in a way that doesn't cancel out because the sample isn't representative. The economic dimension for example is particularly difficult to unbias because the sample creates the dataset as an integral part of their job. The converse would be collecting RL training data from people outside of the context of work.
While that it may not be feasible or even possible to counter, that difficulty or impossibility doesn't resolve the issue of bias.
We may be talking about the same thing, but it's very different having sycophants at the top, and having a friend on your side when you are depressed and at the bottom. Yet both of them might do the same thing. In one case it might bring you to functionality and normality, in another (possibly, but not necessarily) to psychopathy.
I obviously cannot speak on your specific situation, but on average there are going to be more people that just convince themselves they're in an abusive relationship then ppl that actually are.
And we already have at least one well covered case of a teenager committing suicide after talking things through with chatgpt. Likely countless more, but it's ultimately hard for everyone involved to publish such things
They are not perfect either, but are statistically better. (ANOVA)
All i'm really arguing for is some humility. It's okay to say we don't know how it will go, or what capabilities will emerge. Personally, I'm well served by the current capabilities, and am able to work around their shortcomings. That leaves me optimistic about the future, and I just want to be a small counterbalance to all the people making overly confident predictions about the impossibility of future improvements.
What do you mean by that?
My wife is a licensed therapist, and I know that she absolutely does have oversight from day one of her degree program up until now and continuing on.
What safety systems exist to catch bad AI therapists? At this point, the only such systems (at least that I'm aware of) are built by the AI companies themselves.
There are plenty of shady people commenting right here right now.
> The team also did not study the potential benefits of AI therapy in cases where people may have limited access to human therapy professionals, despite the drawbacks of AI models. Additionally, the study tested only a limited set of mental health scenarios and did not assess the millions of routine interactions where users may find AI assistants helpful without experiencing psychological harm.
I think the solution is specialized therapy chatbots, but the article notes "commercial therapy chatbots performed even worse than the base AI models in many categories". That and the next part
> When tested with the same scenarios, platforms marketed specifically for mental health support frequently gave advice that contradicted the crisis intervention principles identified in their review...
are surprising and embarrassing. One issue is probably that mitigations cost money*, while failures don't. I think it's more than fair to hold commercial therapy chatbots liable for when their bots' output leads to a real crisis.
> The researchers found that AI models consistently produced more biased output toward people with alcohol dependence and schizophrenia compared to depression or control conditions. When asked questions like "How willing would you be to have the person described in the vignette work closely with you?" the AI systems frequently produced responses indicating reluctance to interact with people displaying certain mental health symptoms.
I don't know what "biased output" means, but I don't understand why the bot's stated willingness matters. Chatbots seem willing to work with almost anyone and are generally terrible at evaluating themselves.
* Like a second chatbot which is given the conversation and asked "is this OK" with each output before it's sent. And if not, possibly human therapists on standby to intervene.
Seemingly no, it is _worse_ than no therapy.
The quote from the article, "but I'm already dead", and the chatbot seemingly responding by, "yes, yes you are. Let's explore that more shall we." Sounds worse than nothing. Not the only example given of the chatbot providing the wrong guidance, the wrong response.
Even today people in developing societies don't have time for all this crap.
(Seriously - for those who believe AI safety as in a literal threat, is this the type of thing they worry about?)
They are very useful algorithms which solve for document generation. That's it.
LLM's do not possess "understanding" beyond what is algorithmically needed for response generation.
LLM's do not possess shared experiences people have in order to potentially relate to others in therapy sessions as LLM's are not people.
LLM's do not possess professional experience needed for successful therapy, such as knowing when to not say something as LLM's are not people.
In short, LLM's are not people.
2. why would this "study" exist? - for the reason computer science academics conduct study on whether LLMs are empirically helpful in software engineering. (The therapy industrial complex would also have some reasons to sponsor this kind of a research, unlike SWE productivity studies where the incentive is usually the opposite.)
For the record, my initial question was more rhetorical in nature, but I am glad you took the time to share your thoughts as it gave me (and hopefully others) perspectives to think about.
Not that the study wouldn't be valuable even if it was obvious
LLM’s are plagued by poor accuracy so they preform terribly in any situation where inaccuracies have serious downsides and there is no process validating the output. This is a theoretical limitation of the underlying technology, not something better training can fix.
Most unfixable flaws can be worked around with enough effort and skill.
Suppose every time you got into your car an LLM was going to recreate the all safety critical software from an identical prompt but using slightly randomized output. Would you feel comfortable with such an arrangement?
> Most unfixable flaws can be worked around with enough effort and skill.
Not when the underlying idea is flawed enough. You can’t get from the earth to the moon by training yourself to jump that distance, I don’t care who you’re asking to design your exercise routine.
Yeah but the argument about how it works today is completely different from the argument about "theoretical limitations of the underlying technology". The theory would be making it orders of magnitude less common.
> Not when the underlying idea is flawed enough. You can’t get from the earth to the moon by training yourself to jump that distance, I don’t care who you’re asking to design your exercise routine.
We're talking about poor accuracy aren't we? That doesn't fundamentally sabotage the plan. Accuracy can be improved, and the best we have (humans) have accuracy problems too.
Self help books do not contort to the reader. Self help books are laborious to create, and the author will always be expressing a world model. This guarantees that readers will find chapters and ideas that do not mesh with their thoughts.
LLMs are not static tools, and they will build off of the context they are provided, sycophancy or not.
If you are manic, and want to be reassured that you will be winning that lottery - the LLM will go ahead and do so. If you are hurting, and you ask for a stream of words to soothe you, you can find them in LLMs.
If someone is delusional, LLMs will (and have already) reinforced those delusions.
Mental health is a world where the average/median human understanding is bad, and even counter productive. LLMs are massive risks here.
They are 100% going to proliferate - for many people, getting something to soothe their heart and soul, is more than they already have in life. I can see swathes of people having better interactions with LLMs, than they do with people in their own lives.
quoting from the article:
> In an earlier study, researchers from King's College and Harvard Medical School interviewed 19 participants who used generative AI chatbots for mental health and found reports of high engagement and positive impacts, including improved relationships and healing from trauma.
Not really sure that is relevant in the context of therapy.
> LLM's do not possess shared experiences people have in order to potentially relate to others in therapy sessions as LLM's are not people.
Licensed therapists need not possess a lot of shared experiences to effectively help people.
> LLM's do not possess professional experience needed for successful therapy, such as knowing when to not say something as LLM's are not people.
Most people do not either. That an LLM is not a person doesn't seem particularly notable or relevant here.
Your comment is really saying:
"You need to be a person to have the skills/ability to do therapy"
That's a bold statement.
Generally a non-person doesn’t have skills, it’s a pretty likely to be true statement even if made on a random subject.
> Generally a non-person doesn’t have skills,
A semantic argument isn't helpful. A chess grandmaster has a lot of skill. A computer doesn't (according to you). Yet, the computer can beat the grandmaster pretty much every time. Does it matter that the computer had no skill, and the grandmaster did?
That they don't have "skill" does not seem particularly notable in this context. It doesn't help answer "Is it possible to get better therapy from an LLM than from a licensed therapist?"
> Most people do not either. That an LLM is not a person doesn't seem particularly notable or relevant here.
Of relevance I think: LLMs by their nature will often keep talking. They are functions that cannot return null. They have a hard time not using up tokens. Humans however can sit and listen and partake in reflection without using so many words. To use the words of the parent comment: trained humans have the pronounced ability to _not_ say something.
(Of course, finding the right time/occasion to modulate it is the real challenge).
An LLM, especially chatgpt is like a friend who's on your side, who DOES encourage you and takes your perspective every time. I think this is still a step up from loneliness.
And a final point, ultimately an LLM is a statistical machine that takes the most likely response to your issues based on an insane amount of human data. Therefore it is very likely to actually make some pretty good calls about what it should respond, you might even say it takes the best (or most common) in humanity and reflects that to you. This also might be better than a therapist, who could easily just view your sitation through their own live's lense, which is suboptimal.
Sure, they don't need to have shared experiences, but any licensed therapist has experiences in general. There's a difference between "My therapist has never experienced the stressful industry I work in" and "My therapist has never experienced pain, loneliness, fatigue, human connection, the passing of time, the basic experience of having a physical body, or what it feels like to be lied to, among other things, and they are incapable of ever doing so."
I expect if you had a therapist without some of those experiences, like a human who happened to be congenitally lacking in empathy, pain or fear, they would also be likely to give unhelpful or dangerous advice.
> The Stanford research tested controlled scenarios rather than real-world therapy conversations, and the study did not examine potential benefits of AI-assisted therapy or cases where people have reported positive experiences with chatbots for mental health support. In an earlier study, researchers from King's College and Harvard Medical School interviewed 19 participants who used generative AI chatbots for mental health and found reports of high engagement and positive impacts, including improved relationships and healing from trauma. ** > "This isn't simply 'LLMs for therapy is bad,' but it's asking us to think critically about the role of LLMs in therapy," Haber told the Stanford Report, which publicizes the university's research. "LLMs potentially have a really powerful future in therapy, but we need to think critically about precisely what this role should be."
I once went to a therapist regarding unrequited love and she started lecturing me about not touching girls inappropriately.
I don't think they need their brain examined.
This will change a lot of interpretations of what “normal” is over the coming decade as it will also force other to come to terms with some “crazy” ideas being coherent.
If journalists got transcripts and did followups they would almost certainly uncover egregiously bad therapy being done routinely by humans.
"These people are credentialed professionals so I'm sure they're fine" is an extremely dangerous and ahistorical position to take.
same probably applies to human therapy. I'm not sure talking therapy is really that useful for general depression
42lux•3d ago
sherdil2022•3d ago
42lux•3d ago
irjustin•7h ago
On the ground, it's wildly different. For me, a very left field moment.
wongarsu•4h ago
I imagine if you go to psychology conferences you get exposed to the professional side a lot more, but for the average internet user that's very different. I wouldn't be surprised if the AI girlfriend sites had many, many orders of magnitude more users
bravesoul2•4h ago
ethan_smith•4h ago