frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Open-sourcing our clinical triage benchmark for evaluating LLMs

https://github.com/medaks/medask-benchmarks
3•klemenvod•5h ago

Comments

klemenvod•5h ago
Medical triage in our context means whether symptoms require emergency care, urgent care, or can be managed with self-care. This matters because LLMs are increasingly becoming the “digital front door” for health concerns—replacing the instinct to just Google it.

Getting triage wrong can be dangerous (missed emergencies) or costly (unnecessary ER visits).

We’ve open-sourced TriageBench, a reproducible framework for evaluating LLM triage accuracy. It includes:

- A standard clinical dataset (Semigran vignettes)

- Paired McNemar’s test to detect model performance differences on small datasets

- Full methodology and evaluation code

GitHub: https://github.com/medaks/medask-benchmark

As a demonstration, we benchmarked our own model (MedAsk) against several OpenAI models:

- MedAsk: 87.6% accuracy

- o3: 75.6%

- GPT‑4.5: 68.9%

The main limitation is dataset size (45 vignettes). We're looking for collaborators to help expand this - the field needs larger, more diverse clinical datasets.

Blog post with full results: https://medask.tech/blogs/medical-ai-triage-accuracy-2025-me...

NetRunnerSu•5h ago
On the other hand, we can also diagnose LLM itself: the activation value is their EEG, the gradient is their BOLD - if you are at the cost, you can even calculate their true variational free energy - that is, KL divergence.

"Don't just train your model, understand its mind."

https://github.com/dmf-archive/

First Month on a Database Team

https://notes.eatonphil.com/2024-03-11-first-month-on-a-database-team.html
1•ibobev•58s ago•0 comments

How to chart a moral future for space exploration

https://www.nature.com/articles/d41586-025-02070-3
1•Bluestein•2m ago•0 comments

Are Developers Out of a Job?

https://www.argmin.net/p/are-developers-finally-out-of-a-job
1•ColinWright•8m ago•0 comments

My Notes on Strategy

https://rohitgupta.in/blog/2025/07/10/Strategy/
1•rg12345•9m ago•0 comments

Dyson Sphere Program – Dev Log – The New Multithreading Framework

https://store.steampowered.com/news/app/1366540/view/543361383085900510
1•SAI_Peregrinus•14m ago•0 comments

Show HN: Gitwhisper: An app to convert GitHub into a private messaging app

https://medium.com/@adityb/gitwhisper-hackathon-build-5c5e448ddc16
1•devadityb•15m ago•0 comments

Study Suggests Limited Efficacy of Phishing Training in Practice

https://www.computer.org/csdl/proceedings-article/sp/2025/223600a076/21B7RjYyG9q
1•_tk_•17m ago•0 comments

Show HN: Is Anyrouter.top Legit? Testing Free Claude API Credits for Developers

https://medium.com/@adityb/anyrouter-top-free-claude-api-review-706c5da2d122
1•devadityb•17m ago•0 comments

The Healthspan Revolution:A Future on Replacing Healthcare with Thriving Systems

https://foodishealth.substack.com/p/futurecast-the-healthspan-revolution
1•jcarterwil•17m ago•1 comments

Research Is a Drug: How the Thrill of Discovery Becomes a Rabbit Hole

https://opuslabs.substack.com/p/research-is-a-drug
1•opuslabs•18m ago•0 comments

Stone–Wales Transformations

https://johncarlosbaez.wordpress.com/2025/07/12/stone-wales-transformation/
1•chmaynard•19m ago•0 comments

TrackMonk – Health Tracking Through Simple Conversations

https://trackmonk.app
1•surprisetalk•19m ago•0 comments

How Culture Is Made

https://www.metalabel.com/studio/release-strategies/how-culture-is-made
1•surprisetalk•19m ago•0 comments

Show HN: Cogency – Cognitive Architecture for AI Agents

https://github.com/iteebz/cogency
1•cogencyai•20m ago•0 comments

USGS Earthquakes Map

https://alpercinar.com/usgs/
1•surprisetalk•20m ago•0 comments

Will Future Civilizations Bother to Excavate Our Remains?

https://www.palladiummag.com/2025/07/08/will-future-civilizations-bother-to-excavate-our-remains/
1•surprisetalk•20m ago•0 comments

RinaWarp Terminal – First Mermaid AI Terminal

https://rinawarp-terminal.vercel.app/
1•rinawarpt25•21m ago•1 comments

Notes to Myself

https://seths.blog/2025/07/65-thoughts/
1•herbertl•23m ago•0 comments

Soundcraft: Music and Meaning in Writing and Reading

https://youareawriter.substack.com/p/soundcraft
1•herbertl•25m ago•0 comments

Sag-AFTRA video game voice actors' strike is officially over

https://www.polygon.com/gaming/612271/the-sag-aftra-video-game-voice-actors-strike-is-officially-over
1•crtasm•27m ago•0 comments

How Fast Can a Human Run? − Bipedal vs. Quadrupedal Running (2016)

https://pmc.ncbi.nlm.nih.gov/articles/PMC4928019/
1•bookofjoe•29m ago•0 comments

Show HN: A clean popup tool to nudge users without annoying them – now free

https://www.heycustomer.co/
1•ardakaan•29m ago•0 comments

Amelia Earhart's Lockheed Electra 10E May Have Been Found

https://www.jalopnik.com/1906576/amelia-earhart-plane-expedition/
2•Bluestein•29m ago•0 comments

Playing Doom with a Discord GIF [video]

https://www.youtube.com/watch?v=EAQFDM5-zLU
1•LorenDB•29m ago•0 comments

Trump intensifies trade war with 30% tariffs on EU and Mexico

https://www.reuters.com/business/trump-announces-30-tariffs-eu-2025-07-12/
7•gehwartzen•30m ago•0 comments

Leak of data belonging to 7.4M Paraguays traced back to infostealers

https://therecord.media/data-leak-paraguayan-millions-infostealer
1•PaulHoule•36m ago•0 comments

The cosmic dawn from the far side of the Moon

https://cosmosmagazine.com/space/exploration/cosmic-dawn-far-side-moon/
1•Bluestein•38m ago•0 comments

Senators might preserve, not slash, funding for US National Science Foundation

https://www.nature.com/articles/d41586-025-02171-z
1•rntn•39m ago•0 comments

Commodore 64 Ultimate: Basic Beige

https://www.commodore.net/product-page/commodore-64-ultimate-basic-beige-batch1
1•erickhill•39m ago•0 comments

Ask HN: Stuck in the 10x1y Trap

2•fud101•40m ago•0 comments