"As an AI agent, a possible diagnosis is [xxx]. Ask your doctor about / look into [yyy™] for a possible solution!"
https://www.axios.com/2024/12/03/openai-ads-chatgpt
> OpenAI CFO Sarah Friar told the Financial Times that OpenAI is weighing the inclusion of ads in its products but wants to be "thoughtful about when and where we implement them."
It hallucinated serious cancer, along with all the associated details you’d normally find on a lab report. It had an answer to every question I had pre-asked about the report.
The report said the opposite: no cancer detected.
4o, o4? I'm certain it wasn't 3.5
Sigh. This is a point in favor of not allowing free access to ChatGPT at all given that people are getting mad at GPT-4o-mini which is complete garbage for anything remotely complex... and garbage for most other things, too.
Just give 5 free queries of 4o/o3 or whatever and call it good.
I gave it a pdf of an engine manual recently and asked some questions, which it answered reasonably. It even pulled a schematic out for me, though it was the wrong one (it gave me a schematic for the CDI ignition variant that we first talked about, rather than the DSAI one we settled on later.)
I've found o3 & deep research to be very effective in guiding my health plan. One interesting anecdote - I got hit in the chest (right over the heart) quite hard a month or so ago. I prompted o3 with my ensuing symptoms and heart rate / oxygenation data from my Apple watch, and it already knew my health history from previous conversations. It gave very good advice and properly diagnosed me with a costochondral sprain. It gave me a timeline to expect (which ended up being 100% accurate) and treatments / ointments to help.
IMO - it's a good idea to have a detailed prompt ready to go with your health history, height/weight, medications and supplements, etc. if anything's happening to you you've got it handy to give to o3 to help in a diagnosis.
The other stuff is good to have but ultimately a model that focuses on diagnosing medical conditions is going to be the most useful. Look - we aren't going to replace doctors anytime soon but it is good to have a second opinion from an LLM purely for diagnosis. I would hope it captures patterns that weren't observed before. This is exactly the sort of thing game that AI can beat a human at - large scale pattern recognition.
Finally I typed in my entire history into o3-deep-research and let it rip for a while. It came back with a theory for the injury that matched that one doctor, diagrams of muscle groups and even illustrations of proposed exercises. I'm not out of the woods yet, but I am cautiously optimistic for the first time in a long time.
Yes, they propose exercises.
No, they don't work.
For certain (common) conditions, PT seems to have it nailed - the exercises really help. For the others, it's just snake oil. Not backed by much research. The current state of the art is just not good when it comes to chronic pain.
So while I don't know if an LLM can be better than a battery of human experts, I do know that those human experts do not perform well. I'm guessing with the OP's case, that battery of human experts does not lead to a consensus - you just end up with 10 different treatments/diagnoses (and occasionally, one is a lot more common than the other, but it's still wrong).
So what use case does this test setup reflect? Is there a relevant commercial use case here?
With the healthcare prices increasing at the breakneck speed, I am sure AI will take more and more role in diagnosing and treating people's common illnesses, and hopefully (doubt it), the some of that savings will be transferred to the patients.
P.S. In contrast to the US system, in my home city (Rangoon, Burma/Myanmar), I have multiple clinics near my home and a couple of pharmacy within two bus stops distance. I can either go buy most of the medications I need from the pharmacy (without prescription) and take them on my own (why am I not allowed to take that risk?) OR I can go see a doctor at one of these clinics to confirm my diagnosis, pay him/her $10-$20 for the visit, and then head down to the pharmacy to buy the medication. Of course, some of the medications that include opioids will only be sold to me with the doctor's prescription, but a good number of other meds are available as long as I can afford them.
Zaheer•2h ago
tough•2h ago
simianwords•1h ago
tough•1h ago
less people using them.
Insanity•1h ago
simianwords•1h ago
tough•1h ago
reissbaker•1h ago
I think the actually-relevant issue here is that until last month there wasn't API access for Grok 3, so no one could test or benchmark it, and you couldn't integrate it into tools that you might want to use it with. They only allowed Grok 2 in their API, and Grok 2 was a pretty bad model.
tough•1h ago
moralestapia•1h ago
Also, only one out of the ten models benchmarked have open weights, so I'm not sure what GP is arguing for.
tough•1h ago
not talking about TFA or benchmarks but the news coverage/user sentiment ...