The Path to Medical Superintelligence

https://microsoft.ai/new/the-path-to-medical-superintelligence/

10•brandonb•7mo ago

Comments

PaulHoule•7mo ago

I was doing a comparative analysis of the acquistion strategies of various "big tech" firms and was a little startled that I missed Microsoft's 2022 acquistion of Nuance, largely for its speech recognition systems aimed at the medical sector:

https://news.microsoft.com/source/2022/03/04/microsoft-compl...

gm678•7mo ago

> Microsoft AI Diagnostic Orchestrator (MAI-DxO) correctly diagnoses up to 85% of NEJM case proceedings, a rate more than four times higher than a group of experienced physicians.

> Clinicians in our study worked without access to colleagues, textbooks, or even generative AI, which may feature in their normal clinical practice.

1. As I understand, it's very common for doctors to fall back on reference material in their practice, especially for the most complex cases. If all access to resources was cut off (as seems to be implied by the second quote), the comparison seems somewhat unfair.

2. What were the publication dates of the case records? I can't find this information, and it makes a difference if the NEJM case studies were in the LLMs' training data.

miraculixx•7mo ago

Exactly. The study has been set up to produce this exact result. They essentially limited the human doctors to bare essentials, on specialist cases(!), while providing the LLMs with all sorts of help, including discussion among several AIs.

That's like letting one group of students have a strict closed-book exam, while another group can take the test as a group exercise and accessing any material they like, then claiming that closed-book exams lead to worse outcomes.

In a nutshell the study is just slop designed to get attention. The headline result is what they really want people to hear, and that's all the media will be repeating.

miraculixx•7mo ago

As any AI researcher knows, if you have a model that does 4x better than the naive baseline (the humans, in this case), you are likely looking at overfit, not real-life performance. This study is just slop, and you can tell so by the mere fact that they did not submit a paper, but just published a PR article.

LargoLasskhyfv•7mo ago

They didn't? What am I looking at, then?

https://arxiv.org/abs/2506.22405

This appears when you click on 'View Publication' in the article near the end, right before Q&A.

brandonb•7mo ago

In the paper, they say they used the most recent 56 cases (from 2024–2025) as a holdout set. The majority of those cases happened after the o4 training cutoff of May 31, 2024.

miraculixx•7mo ago

Are these 56 cases distinct from all other cases in the data?

FlyingLawnmower•7mo ago

Yes. They are about entirely different patient reports.

Can Europe get kids off social media?

I Built a NAS (Buildlog)

Making Software: How do computers store data?

A timeline of claims about AI/LLMs

Freeciv 3D with hex map tiles and WebGPU renderer

SpaceX-xAI Merger: Nobody's Talking About the von Neumann Elephant in the Room

Smart Homes Are Terrible

Ask HN: Would you use an ESLint-like tool for SEO that fails your CI/CD build?

Praise for Price Gouging

Open source infra orchestrator agent clanker CLI

Lance table format explained simply, stupid (Animated)

Solving Soma

We built a cloud platform for agentic software (our virtualization, etc.)

Show HN: WLM-SLP – A 0D-27D Structural Language for Multi-Agent Alignment

Former Tumblr Head Jeff D'Onofrio Steps in as Acting CEO at the Washington Post

Bounded Flexible Arrays in C

The Invisible Labor Force Powering AI

Reading Recursion via Pascal

Show HN: I made a website that finds patterns on your spreadsheet

Jokes on You AI: Turning the Tables – LLMs for Learning

You don't need RAG in 2026

WatchLLM – Cost kill switch for AI agents (with loop detection)

I turned myself into an AI-generated deathbot – here's what I found

Management style doesn't predict survival

One Generation Runs the Country. The Next Cashed in on Crypto

"I Was Wrong": Why the Civil War Is Running Late [video][2h21m]

Show HN: A sandboxed execution environment for AI agents via WASM

Wine-Staging 11.2 Brings More Patches to Help Adobe Photoshop on Linux

The Nature of the Beast

From Prediction to Compilation: A Manifesto for Intrinsically Reliable AI

Can Europe get kids off social media?

I Built a NAS (Buildlog)

Making Software: How do computers store data?

A timeline of claims about AI/LLMs

Freeciv 3D with hex map tiles and WebGPU renderer

SpaceX-xAI Merger: Nobody's Talking About the von Neumann Elephant in the Room

Smart Homes Are Terrible

Ask HN: Would you use an ESLint-like tool for SEO that fails your CI/CD build?

Praise for Price Gouging

Open source infra orchestrator agent clanker CLI

Lance table format explained simply, stupid (Animated)

Solving Soma

We built a cloud platform for agentic software (our virtualization, etc.)

Show HN: WLM-SLP – A 0D-27D Structural Language for Multi-Agent Alignment

Former Tumblr Head Jeff D'Onofrio Steps in as Acting CEO at the Washington Post

Bounded Flexible Arrays in C

The Invisible Labor Force Powering AI

Reading Recursion via Pascal

Show HN: I made a website that finds patterns on your spreadsheet

Jokes on You AI: Turning the Tables – LLMs for Learning

You don't need RAG in 2026

WatchLLM – Cost kill switch for AI agents (with loop detection)

I turned myself into an AI-generated deathbot – here's what I found

Management style doesn't predict survival

One Generation Runs the Country. The Next Cashed in on Crypto

"I Was Wrong": Why the Civil War Is Running Late [video][2h21m]

Show HN: A sandboxed execution environment for AI agents via WASM

Wine-Staging 11.2 Brings More Patches to Help Adobe Photoshop on Linux

The Nature of the Beast

From Prediction to Compilation: A Manifesto for Intrinsically Reliable AI

The Path to Medical Superintelligence

Comments