frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Cancer diagnosis makes for an interesting RL environment for LLMs

6•dchu17•1h ago
Hey HN, this is David from Aluna (YC S24). We work with diagnostic labs to build datasets and evals for oncology tasks.

I wanted to share a simple RL environment I built that gave frontier LLMs a set of tools that lets it zoom and pan across a digitized pathology slide to find the relevant regions to make a diagnosis. Here are some videos of the LLM performing diagnosis on a few slides:

(https://www.youtube.com/watch?v=k7ixTWswT5c): traces of an LLM choosing different regions to view before making a diagnosis on a case of small-cell carcinoma of the lung

(https://youtube.com/watch?v=0cMbqLnKkGU): traces of an LLM choosing different regions to view before making a diagnosis on a case of benign fibroadenoma of the breast

Why I built this:

Pathology slides are the backbone of modern cancer diagnosis. Tissue from a biopsy is sliced, stained, and mounted on glass for a pathologist to examine abnormalities.

Today, many of these slides are digitized into whole-slide images (WSIs)in TIF or SVS format and are several gigabytes in size.

While there exists several pathology-focused AI models, I was curious to test whether frontier LLMs can perform well on pathology-based tasks. The main challenge is that WSIs are too large to fit into an LLM’s context window. The standard workaround, splitting them into thousands of smaller tiles, is inefficient for large frontier LLMs.

Inspired by how pathologists zoom and pan under a microscope, I built a set of tools that let LLMs control magnification and coordinates, viewing small regions at a time and deciding where to look next.

This ended up resulting in some interesting behaviors, and actually seemed to yield pretty good results with prompt engineering:

- GPT 5: explored up to ~30 regions before deciding (concurred with an expert pathologist on 4 out of 6 cancer subtyping tasks and 3 out of 5 IHC scoring tasks)

- Claude 4.5: Typically used 10–15 views but similar accuracy as GPT-5 (concurred with the pathologist on 3 out of 6 cancer subtyping tasks and 4 out of 5 IHC scoring tasks)

- Smaller models (GPT 4o, Claude 3.5 Haiku): examined ~8 frames and were less accurate overall (1 out of 6 cancer subtytping tasks and 1 out of 5 IHC scoring tasks)

Obviously, this was a small sample set, so we are working on creating a larger benchmark suite with more cases and types of tasks, but I thought this was cool that it even worked so I wanted to share with HN!

11foot8.com: Trucks and Bridges

https://11foot8.com/
1•busymom0•1m ago•0 comments

How Tube Amplifiers Work

https://robrobinette.com/How_Amps_Work.htm
1•gokhan•2m ago•0 comments

Novel textile can adjust its aerodynamic properties on demand

https://techxplore.com/news/2025-10-textile-adjust-aerodynamic-properties-demand.html
1•PaulHoule•2m ago•0 comments

Show HN: HMPL-DOM – Direct Replacement for Htmx

https://github.com/hmpl-language/hmpl-dom
1•aanthonymax•2m ago•0 comments

Valve's new Steam Frame headset – Arm-powered, new translation layer for x86

https://www.tomshardware.com/peripherals/gaming-headsets/hands-on-with-valves-new-steam-frame-hea...
1•T-A•3m ago•0 comments

Rendering Samples with Showcase for Ruby on Rails

https://blog.appsignal.com/2025/11/12/rendering-samples-with-showcase-for-ruby-on-rails.html
1•unripe_syntax•4m ago•0 comments

Valve's New Steam Machine

https://www.pcgamer.com/hardware/gaming-pcs/steam-machine-specs-availability/
1•evanjrowley•4m ago•0 comments

Anthropic invests $50B in American AI infrastructure

https://www.diffusemind.com/group/8a325136-480b-4e75-a40c-a5d254051109/index.html
1•karmsingh•5m ago•1 comments

Free email template builder for HTML Emails

https://mailui.co/blog/best-email-template-builders-2
1•sifulweb•5m ago•0 comments

Weighting an Average to Minimize Variance

https://www.johndcook.com/blog/2025/11/12/minimum-variance/
1•azhenley•5m ago•0 comments

Ask HN: What's your way to deploy ComfyUI to production?

1•sprocketus•7m ago•0 comments

Valve Enters the Console Wars

https://www.theverge.com/games/818622/valve-console-wars-price-sony-microsoft-nintendo-windows
3•aaronbrethorst•7m ago•0 comments

I'm taking a three-week LLM fast

https://cekrem.github.io/posts/im-taking-a-three-week-llm-fast/
2•freediver•8m ago•0 comments

Claude Code Custom Commands: 3 Practical Examples

https://www.aiengineering.report/p/claude-code-custom-commands-3-practical
1•moonlighthacker•8m ago•0 comments

Maestro Technology Sells Used SSD Drives as New

https://kozubik.com/items/MaestroTechnology/
5•walterbell•9m ago•0 comments

Cyberspace.online Social media de-imagined. A quiet corner of the internet

https://cyberspace.online/
1•def076•9m ago•1 comments

Understanding Customer Demand: A Series

https://commoncog.com/understanding-customer-demand-series/
1•mooreds•9m ago•0 comments

Debating Modern Postgres Architectures: Shared Nothing vs. Shared Everything

1•saisrirampur•10m ago•0 comments

Valve Announces Steam Frame, a "Streaming-First" Standalone VR Headset

https://www.uploadvr.com/valve-steam-frame-official-announcement-features-details/
2•fidotron•10m ago•0 comments

Luck Is a Harsh Mistress

https://keygen.sh/blog/luck-is-a-harsh-mistress/
1•ezekg•11m ago•0 comments

Sound familiar? Matching voices boost trust in self-driving cars

https://news.umich.edu/sound-familiar-matching-voices-boost-trust-in-self-driving-cars/
1•mooreds•11m ago•0 comments

Valve Announces New Steam Machine, Steam Controller and Steam Frame

https://www.phoronix.com/news/Steam-Machines-Frame-2026
2•doener•11m ago•0 comments

Image-to-Text Contextmenu (Chrome Browser Extension)

https://devpost.com/software/image-to-text-contextmenu-chrome-browser-extension
1•mooreds•12m ago•0 comments

Electronics Boutique: We Byte the Bullet – By Paul Lefebvre

https://www.goto10retro.com/p/electronics-boutique-we-byte-the
1•rbanffy•12m ago•0 comments

The Young Women Grappling with an 'Old Man's Disease'

https://www.nytimes.com/2025/11/11/health/als-young-women.html
1•paulpauper•13m ago•0 comments

Attackers turned Citrix, Cisco 0-day exploits into custom-malware hellscape

https://www.theregister.com/2025/11/12/amazon_cisco_citrix_0day_exploits/
1•Bender•13m ago•0 comments

Apple reserves over 50% of TSMC 2nm capacity for 2026

https://dataconomy.com/2025/11/12/apple-reserves-over-50-of-tsmc-2nm-capacity-for-2026/
3•giuliomagnifico•13m ago•0 comments

AI-powered nimbyism could grind UK planning system to a halt, experts warn

https://www.theguardian.com/politics/2025/nov/09/ai-powered-nimbyism-could-grind-uk-planning-syst...
1•paulpauper•13m ago•0 comments

Decorating with Information Density

https://www.nytimes.com/2025/11/08/t-magazine/peter-ayers-tarantino-apartment.html
1•paulpauper•14m ago•0 comments

A structural pattern for Legible Modular software

https://arxiv.org/abs/2508.14511
1•vjdingdong•14m ago•0 comments