frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Sequential Diagnosis with Language Models

https://arxiv.org/abs/2506.22405
2•FlyingLawnmower•3h ago

Comments

FlyingLawnmower•3h ago
Artificial intelligence holds great promise for expanding access to expert medical knowledge and reasoning. However, most evaluations of language models rely on static vignettes and multiple-choice questions that fail to reflect the complexity and nuance of evidence-based medicine in real-world settings. In clinical practice, physicians iteratively formulate and revise diagnostic hypotheses, adapting each subsequent question and test to what they've just learned, and weigh the evolving evidence before committing to a final diagnosis. To emulate this iterative process, we introduce the Sequential Diagnosis Benchmark, which transforms 304 diagnostically challenging New England Journal of Medicine clinicopathological conference (NEJM-CPC) cases into stepwise diagnostic encounters. A physician or AI begins with a short case abstract and must iteratively request additional details from a gatekeeper model that reveals findings only when explicitly queried. Performance is assessed not just by diagnostic accuracy but also by the cost of physician visits and tests performed. We also present the MAI Diagnostic Orchestrator (MAI-DxO), a model-agnostic orchestrator that simulates a panel of physicians, proposes likely differential diagnoses and strategically selects high-value, cost-effective tests. When paired with OpenAI's o3 model, MAI-DxO achieves 80% diagnostic accuracy--four times higher than the 20% average of generalist physicians. MAI-DxO also reduces diagnostic costs by 20% compared to physicians, and 70% compared to off-the-shelf o3. When configured for maximum accuracy, MAI-DxO achieves 85.5% accuracy. These performance gains with MAI-DxO generalize across models from the OpenAI, Gemini, Claude, Grok, DeepSeek, and Llama families. We highlight how AI systems, when guided to think iteratively and act judiciously, can advance diagnostic precision and cost-effectiveness in clinical care.

The Simple Math Problem We Still Can't Solve

https://www.quantamagazine.org/why-mathematicians-still-cant-solve-the-collatz-conjecture-20200922/
1•chirau•2m ago•0 comments

They don't make 'em like that any more: Sony DTC-700 audio DAT player/recorder

https://kevinboone.me/dtc-700.html
1•naves•3m ago•0 comments

User-friendly and privacy-friendly LLM experience?

https://tildes.net/~comp/1orz/user_friendly_and_privacy_friendly_llm_experience
1•PaulHoule•3m ago•0 comments

Ask HN: Stock Android tablet free of bloatware?

1•miki_tyler•4m ago•0 comments

Alice's Adventures in a Differentiable Wonderland

https://arxiv.org/abs/2404.17625
1•henning•4m ago•0 comments

Why We Should Care About This War over the Future of Money

https://gizmodo.com/why-you-should-care-about-this-war-over-the-future-of-money-2000622009
1•rntn•6m ago•0 comments

Lua 5.5.0 (Beta) Released

https://www.lua.org/work/#5.5.0
1•dottrap•7m ago•1 comments

Proton joins suit against Apple for predatory practices

https://proton.me/blog/apple-lawsuit
1•moose44•8m ago•0 comments

Writing Code to Be Read at a Glance

https://jelv.is/blog/Writing-Code-To-Be-Read-at-a-Glance/
1•lawn•8m ago•0 comments

Fake AI-made videos about Diddy trial are raking in millions of views on YouTube

https://www.theguardian.com/technology/2025/jun/29/fake-diddy-ai-videos-youtube
1•sandebert•8m ago•0 comments

Operation Gold Rush, largest health care fraud bust in U.S. history

https://www.washingtonpost.com/health/2025/06/30/health-care-fraud-bust-largest-in-us-history/
2•brandonb•8m ago•0 comments

Satellites are fueling a space-based internet gold rush

https://restofworld.org/2025/satellites-space-based-internet/
1•tysone•9m ago•0 comments

Hit songs are getting shorter

https://www.economist.com/culture/2025/06/02/hit-songs-are-getting-shorter
1•gmays•9m ago•0 comments

Trump Vowed to Dismantle MS-13. His Deal with Bukele Threatens That Effort

https://www.nytimes.com/2025/06/30/us/politics/trump-bukele-ms-13-immigrants.html
3•JumpCrisscross•9m ago•0 comments

iOverlander's Pivot Shows the Cost of Community-Driven Tech

https://www.hereandthere.club/post/ioverlanders-pivot-shows-the-cost-of-community-driven-tech
1•dzogchen•11m ago•0 comments

Fil-C

https://github.com/pizlonator/llvm-project-deluge
1•mpweiher•11m ago•0 comments

A vision researcher's guide to some RL stuff: PPO and GRPO

https://yugeten.github.io/posts/2025/01/ppogrpo/
1•fzliu•14m ago•0 comments

Insider Trading on SEC Filings

https://www.bloomberg.com/opinion/newsletters/2025-06-30/insider-trading-on-sec-filings
2•ioblomov•14m ago•0 comments

High-Severity Vulnerability in Notepad++

https://www.csa.gov.sg/alerts-and-advisories/alerts/al-2025-063
1•onlinenotepad•17m ago•0 comments

Therapy dogs: stop crafting loopholes to fair, reasonable laws

https://dirtamericana.com/2025/04/therapy-dogs-business-interior-violations/
2•speckx•19m ago•0 comments

Show HN: FastPitchDeck – AI to generate VC-ready pitch decks

https://fastpitchdeckai.vercel.app/
1•ramyavarahagiri•21m ago•0 comments

Writing a Little Gosh

https://flak.tedunangst.com/post/writing-a-gosh
1•dpassens•24m ago•0 comments

Martech Engineer

1•smwbauer•24m ago•1 comments

Can we ever understand our dogs?

https://www.vox.com/explain-it-to-me/418008/dog-pets-perception-science-research-animal-smell
2•lr0•25m ago•0 comments

GenAI – Will Workers Disappear?

https://www.nominalnews.com/p/ai-labor-workers-economy
1•MPLan•26m ago•1 comments

Iran's Internet Blackout Accidentally Revealed Coordinated Narrative in the West

4•Memetic-tracer•27m ago•2 comments

Resources for Disaster Preparedness in Heritage (2024)

https://conserv.io/blog/7-resources-for-disaster-preparedness-in-heritage/
1•mooreds•27m ago•0 comments

Senator Chides FBI for Weak Advice on Mobile Security

https://krebsonsecurity.com/2025/06/senator-chides-fbi-for-weak-advice-on-mobile-security/
3•todsacerdoti•27m ago•0 comments

20 years on, Max Payne is as stylish as ever (2021)

https://www.eurogamer.net/20-years-on-max-payne-is-as-stylish-as-ever
3•Michelangelo11•28m ago•1 comments

Ask HN: Is "ethical AI" possible, or is there a catch?

1•mrdependable•32m ago•2 comments