We built Medical Sphere, a public platform for objective evaluation of AI models in healthcare. You can think of it as LM Arena, but purpose-built for the medical domain, which we believe is large and specific enough to warrant its own dedicated evaluation effort.
Instead of static multiple-choice benchmarks, we let people upload real medical cases (text + images), optionally anonymously, run frontier models (open and closed) side-by-side, and compare where they agree, fail, or hallucinate, all for free.
We also verify medical professionals who want to participate and connect them with research and industry collaboration opportunities.
We’ll be frequently releasing open benchmarks, with PHI/PII carefully redacted, as we’ve already started doing.
The goal is to enable transparent, real-world benchmarks and a data flywheel for more trustworthy medical AI. Live and free to use. Would love feedback from folks working on AI evaluation, healthcare ML, or safety.
medicalsphere•16h ago
Instead of static multiple-choice benchmarks, we let people upload real medical cases (text + images), optionally anonymously, run frontier models (open and closed) side-by-side, and compare where they agree, fail, or hallucinate, all for free.
We also verify medical professionals who want to participate and connect them with research and industry collaboration opportunities.
We’ll be frequently releasing open benchmarks, with PHI/PII carefully redacted, as we’ve already started doing.
The goal is to enable transparent, real-world benchmarks and a data flywheel for more trustworthy medical AI. Live and free to use. Would love feedback from folks working on AI evaluation, healthcare ML, or safety.