1. ChatGPT O1 significantly outperformed any combination of Doctor + Resources (Median score of 86% vs 34%-42% of doctors). Hence superhuman results (at least compared against average physicians)
2. ChatGPT + Doctor performs worse than just ChatGPT alone.
This means that the situation is getting similar to Chess - where adding Magnus Carlsen as a helper to Stockfish (a strong open source chess enginer)) could only make Stockfish worse.
Check out Fitts, HABA-MABA for more results.
We remain a very long way from “ChatGPT will see you now”.
In the meantime, in the real world, I suspect the infamous "Dr Google" is being supplanted by "Dr LLM". It will be difficult to ethically study whether even this leads to generally better patient outcomes.
_________
edit: clarity
Absolutely.
As for everything else, as pointed out, these programs are insufficient. As with programmers and other white collar professions it seems ideal to integrate these tools into the workplace rather than try and replace the human completely.
Businesspeople probably dream of huge profits by replacing their workforce with AI models, and the marketers and proprieters of AI are likely to overpromise what their products can do as is the SV tradition. To promise the moon in order to extract maximum funding.
https://docs.google.com/spreadsheets/d/1kc262HZSMAWI6FVsh0zJ...
Cases like: - The AI replaces a salesperson but the sales are not binding or final, in case the client gets a bargain at $0 from the chatbot.
- It replaces drivers but it disengages 1 second before hitting a tree to blame the human.
- Support wants you to press cancel so the reports say "client cancel" and not "self drive is doing laps around a patch of grass".
- Ai is better than doctors at diagnosis, but in any case of misdiagnosis the blame is shifted to the doctor because "AI is just a tool".
- Ai is better at coding that old meat devs, but when the unmaintainable security hole goes to production, the downtime and breaches cannot be blamed on the AI company producing the code, it was the old meat devs fault.
AI companies want the cake and eat it too, until i see them eating the liability, i know, and i know they know, it's not ready for the things they say it is.
We're getting there!
The point is that clinicians don't really get sued most of the time anyway for misdiagnoses. With AI, all one has to do is open up a new chat, tell the AI that its last diagnosis isn't really helping, and it will eagerly give an updated assessment. Compared to a clinician, the AI dramatically lowers the bar of iteratively working with it to help address an issue.
As for drug prescriptions, they are to be processed through an interactions checker anyway.
The reason is simple. They are trained as plausibility engines. It's more plausible that a bad diagnostician gives you a worse outcome than a good one, and you have literally just prompted it that it's bad at diagnosis.
Sure, you might get another text completion. Will it be correct, actionable, reliable, safe? Even a stopped clock. Good luck rolling those dice with your health.
In summary, do not iterate with prompts for declining competence.
The former is reasonable to include when iterating. The latter is a recipe for outcome degradation. GP above gave the latter form. That activates attention from parts of the model guiding towards confabulation and loss of faithfulness.
The model doesn't know what is true, only what is plausible to emit. The hypothesis that plausibility converges with scale towards truth and faithfulness remains very far from proven. Bear in mind that the training data includes large swatches of arbitrary text from the Internet, real life, and from fiction, which includes plenty of examples of people being wrong, stupid, incompetent, repetitive, whimsical, phony, capricious, manipulative, disingenuous, repetitive, argumentative, and mendacious. In the right context these are plausible human-like textual interactions, and the only things really holding it back from completion in such directions are careful training and the system prompt. Worst case scenario, perhaps the corpus included parliamentary proceedings from around the world. "Suppose you were an idiot. And suppose you were a member of Congress. But I repeat myself." - Mark Twain
spwa4•19h ago
1) they can't take measurements themselves.
2) they don't adapt on the job. Illnesses do. In other words, if there is a contagious health emergency, an LLM would see the patients ... and ignore the emergency.
3) they are very bad at figuring out if a patient is lying to them (which is a required skill: combined with 2, people would figure out how to get the LLM to prescribe them morfine and ...)
4) they are generally socially problematic. A big part of being a doctor is gently convincing a patient their slightly painful toe does not in fact justify a diagnosis of bone cancer ... WITHOUT doing tests (that would be unethical, as there's zero chance of those tests yielding positive results)
5) they will not adapt to people. LLMs will not adapt, people will. This means patients will exploit LLMs to achieve a whole bunch of aims (like getting drugs, getting days off, getting free hospital stays, ...) and it doesn't matter how good LLMs are. An adaptive system vs a non-adaptive system ... it's a matter of time.
6) they are not themselves patients. This is a fundamental problem: it will be very hard for an LLM to collect new information about "the human condition" and new problems it may generate. There's many examples of this, from patients drinking radium solution (it lights up in the dark, so surely, it must give extra energy, right? Even sexual energy, right?) to rivers or ponds that turn out to have serious diseases lurking around. Meaning a doctor needs to be able to make the decision to go after problems in society when society finds a new, catastrophically dumb, way to hurt itself.
Now you might say "but they would still be good in the developing world, wouldn't they?". Yes, but as the tuberculosis vaccine efforts sadly showed: the developing world is developing partially because they invest nothing whatsoever in (poor) people's health. Nothing. Zero. Rien. Which means, making health services cheaper (e.g. providing a cheap tuberculosis vaccine) ... has the problem that it does not increase the value of zero. They won't pay for healthcare ... and they won't pay for cheaper healthcare. And while Bill Gates ad the US government do pay for a bit of this, they're not sustainable solutions. If, however, you train a local with basic medical skills, there's a lot they can do for free, which actually helps.
timschmidt•18h ago
derbOac•18h ago
1 is often (usually?) not done today by physicians per se anyway.
2 is kind of a strawman about LLMs.
6 is maybe the most challenging critique but is also kind of an empirical one, in the sense that if LLMs routinely outperform physicians in decision making (at least under certain circumstances) it will hard to make the case that it matters.
I have my biases but in general I think at least in the US there needs to be a serious rethinking about how medical decisions can be made and how care can be provided.
I'm skeptical about this paper — the real test will be something like widespread preregistered replication across a wide variety of care settings — but that would happen anyway before it would be adopted. If it works it works and if it doesn't it won't.
My guess is under the best of circumstances it won't get rid of humans, it will just change what they're doing and maybe who is doing that.
spwa4•7h ago
Empathy, by the way, is something that happens between 2 people. It is not "inside" one person, and it goes in both directions. So that can't be fixed. Empathy towards you only works if you honestly believe that the other person is genuinely worried about you (and, of course, empowered, able and willing to do something about your situation). If you start out with an LLM, you start out 100% convinced (and correct in that, btw) that they aren't worried about you. So it won't work. That even works in reverse. Patients try to deceive doctors ... but within reason, because there's empathy the other way too. When patients have to deceive LLMs instead for a week off from work, there will be zero empathy, zero shame and zero limitations on behavior that from the patient side. This is not solvable. Hell, it'll be a problem of trust. A doctor is trying to help you ... probably ... with maybe 10% helping the company. An LLM is 100% trying to help the company ... do you take the medicine (or accept nothing's wrong?). Is it what's best for you? Let's face it: the whole point of LLM medicine is that it's not what's best for you.
Also ... so your critique is that even doctors sometimes cater to interests that are not the patients' best interest? Ok. True.
But you do realize that if you apply that as a critique of LLMs, that are corporate controlled, it's going to be 1000x worse? So I don't understand the critique. Yes doctors aren't perfect, flawed in many ways. That is not a good reason to introduce something 1000x worse. The whole point of LLM medicine is that stealing the profit!
timschmidt•5h ago
> [LLMs] don't learn on the job
Funny, the ones I work with get updated with new information all the time. No reason it couldn't happen after every single encounter.
> Empathy, by the way, is something that happens between 2 people.
People will anthropomorphize anything and that includes empathizing with a rock. A rock cannot empathize back. QED.
> If you start out with an LLM, you start out 100% convinced (and correct in that, btw) that they aren't worried about you. So it won't work.
See above. Your point is also undermined by the sheer volume of people currently using LLMs as therapists.
> Patients try to deceive doctors ... but within reason, because there's empathy the other way too. When patients have to deceive LLMs instead for a week off from work, there will be zero empathy, zero shame and zero limitations on behavior that from the patient side.
You seem to think people have limitations on their behavior now. I assure you they do not. Talk to a doctor or social worker about it. It wears them out. LLMs don't get worn out and are much better at calmly and kindly interacting for as long as someone needs.
> A doctor is trying to help you ... probably ... with maybe 10% helping the company. An LLM is 100% trying to help the company
Some doctors are. Some are more worried about their next golf game or maserati. You are fully pessimistic about LLMs but incredibly naive about people.
> Let's face it: the whole point of LLM medicine is that it's not what's best for you.
Hard disagree. Last I was in the hospital for a life threatening condition (pancreatitis caused by gall stones) I got to talk to a doctor for 5 minutes. I educated myself about my condition thanks to Wikipedia and medical journals. An LLM would have been incredibly helpful.
> do realize that if you apply that as a critique of LLMs, that are corporate controlled, it's going to be 1000x worse?
Good thing there are academic, open source, open weight, jailbroken, fine tuned, local LLMs I can run myself and even cross-check between multiple competing models and models from different countries with entirely different economic and political systems for nearly no cost for additional assurance.
> So I don't understand the critique. Yes doctors aren't perfect, flawed in many ways.
And incredibly expensive, unavailable without appointment sometimes months or years in advance, incredibly limited when it comes to any specialization, sometimes distracted, exhausted, grumpy, dismissive, disagreeable, and subject to all manner of human failings.
I'm not sure what Black Mirror episode you're drawing your outlook on life from, but it bears no resemblance to the reality I live in every day.
I don't think doctors will be wholesale replaced by LLMs.
I am already seeing LLMs lowering barriers to acquiring first-line medical advice and second opinions.
Will dystopic things happen? Sure, they're already happening every day without any help from LLMs. That'll continue into the foreseeable future. But having an LLM to ask for second opinions might have saved my Dad's life, and would have allowed me to better educate myself about my own life threatening condition. Good luck prying them out of my hands.
msgodel•5h ago
What? Do you not know how any of this works at all?
Updating LLM weights means calculating the gradients and back propagating. And that's only for unsupervised learning, for the reinforcement learning you want the gradients need to be accumulated until you have some way to score the episode. For usefully large language models there are serious logistical problems doing this at all, it's one of the most computationally intensive tasks you can use computers for currently. It certainly can't be done for every interaction.
timschmidt•5h ago
Also, it doesn't take a full re-training to add someone's medical history and even recent events like pandemics to the system prompt.
CityOfThrowaway•18h ago
The foundation models don't adapt quickly, but you can definitely build systems to inject context that changes behaviors
And if you build that system intentionally and correctly, then it's handled for all patients. With human doctors, each individual doctor has to be fed context and change their behavior based on the information, which is stochastic to say the least.
howlin•18h ago
doug_durham•18h ago
inopinatus•18h ago
fnordpiglet•17h ago
1) of course not they would be fed information, but as we build multi modal models that can achieve more and more world integration there’s no reason why not.
2) They’re very adaptive, by their abductive nature they adapt extraordinarily well to new situations. Perhaps too much - hence the challenge with hallucinations.
3) this isn’t necessarily true, as can be seen by some of the modern alignment in SOTA models being more and more difficult to evade. When prompted and aligned with drug seeking behavior training why would you assume they’re bad at detecting this?
4) again I don’t see why this is true. A general purpose LLM might be, but one that’s been aligned properly should do fine.
5) why do you think LLMs are not adaptive? They adapt through reinforcement and alignment. As a larger corpus of interactions are available they adapt and align towards the training goals. There is extensive research and experience in alignment to date, and models are often continuously adapted. You don’t need to retrain the entire base model you can just retrain a LoRA or embeddings. You can even adapt to specific situations by dynamically pulling in a LoRA or embeddings set for situations.
6) They have human like responses to human situations because they’re trained on a corpus of human language. For a highly specialized model you can ensure specific types of human experience and behavior are well represented and reinforced. You can align the behavior to be what you need.
All this said I don’t think anyone in this is proposing to take humans entirely out of the loop. But there are many situations where ML models or even heuristics out perform human experts in their own field. There’s no reason to believe LLMs, especially when augmented with diagnostic expert system agents, couldn’t generally out perform a doctor in diagnosis. This doesn’t mean the human doctor is irrelevant but that their skills are enhanced and patient outcomes improve with them help of such systems.
Regardless though I feel these criticisms of the approach reflect a naïveté about the ways these models work and what they’re capable of.
timschmidt•5h ago