Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements

https://www.biotradingarena.com/hn

22•dchu17•6h ago

Hi HN,

My friend and I have been experimenting with using LLMs to reason about biotech stocks. Unlike many other sectors, Biotech trading is largely event-driven: FDA decisions, clinical trial readouts, safety updates, or changes in trial design can cause a stock to 3x in a single day (https://www.biotradingarena.com/cases/MDGL_2023-12-14_Resmet...).

Interpreting these ‘catalysts,’ which comes in the form of a press release, usually requires analysts with previous expertise in biology or medicine. A catalyst that sounds “positive” can still lead to a selloff if, for example: the effect size is weaker than expected

- results apply only to a narrow subgroup

- endpoints don’t meaningfully de-risk later phases,

- the readout doesn’t materially change approval odds.

To explore this, we built BioTradingArena, a benchmark for evaluating how well LLMs can interpret biotech catalysts and predict stock reactions. Given only the catalyst and the information available before the date of the press release (trial design, prior data, PubMed articles, and market expectations), the benchmark tests to see how accurate the model is at predicting the stock movement for when the catalyst is released.

The benchmark currently includes 317 historical catalysts. We also created subsets for specific indications (with the largest in Oncology) as different indications often have different patterns. We plan to add more catalysts to the public dataset over the next few weeks. The dataset spans companies of different sizes and creates an adjusted score, since large-cap biotech tends to exhibit much lower volatility than small and mid-cap names.

Each row of data includes:

- Real historical biotech catalysts (Phase 1–3 readouts, FDA actions, etc.) and pricing data from the day before, and the day of the catalyst

- Linked Clinical Trial data, and PubMed pdfs

Note, there are may exist some fairly obvious problems with our approach. First, many clinical trial press releases are likely already included in the LLMs’ pretraining data. While we try to reduce this by ‘de-identifying each press release’, and providing only the data available to the LLM up to the date of the catalyst, there are obviously some uncertainties about whether this is sufficient.

We’ve been using this benchmark to test prompting strategies and model families. Results so far are mixed but interesting as the most reliable approach we found was to use LLMs to quantify qualitative features and then a linear regression of these features, rather than direct price prediction.

Just wanted to share this with HN. I built a playground link for those of you who would like to play around with it in a sandbox. Would love to hear some ideas and hope people can play around with this!

Comments

austinwang115•4h ago

Interesting, biotech stocks have been notoriously hard to predict because their business model revolves around science, and it’s hard to know when the science is right. Depending on the situation, I think sentiment could potentially be a misleading/confounding variable here…

observationist•2h ago

Sentiment is crucial - if you know sentiment is incorrectly oriented, you can capitalize on it. If you know it's correct, you can identify mispricing, and strategize accordingly.

worik•2h ago

Why do you think that LLMs would do any better than monkeys throwing darts?

I am raining on your parade but this is another in a long succession of ways to loose money.

The publicly available information in markets is priced very efficiently, us computer types do not like that and we like to think that our pattern analysis machines can do better than a room full of traders. They cannot.

The money to be made in markets is from private information and that is a crime (insider trading), is widespread, and any system like this is fighting it and will loose.

sjkoelle•1h ago

efficiency is not a given. also this is an eval set - they acknowledge the challenge themselves.

imho this is v cool

dchu17•1h ago

Our initial goal with this project actually wasn't trying to get an edge in terms of better evaluating information, but rather, we wanted to see if an LLM can perform similarly to a human analyst at a lower latency. The latency for the market to react to catalysts is actually surprisingly high in biotech (at least in some cases) compared to other domains so there may be some edge there.

Appreciate the comment though! I generally agree with your sentiment!

bjconlan•1h ago

I used to work for a human that did this (sits mostly on the classical therapeutics side). He actually started a business where he was reviewing and auditing the submission processes outlining approvals but he had been around the game enough to know where the next submission would put them in the approvals process for a number of agencies.

https://maestrodatabase.com/

Looks like he's still on top of everything given the most recent blog post is from 6/2/2026.

I believe the insights here could be useful given he has sense of when the penultimate submission has occured (but I'm not entirely sure what that is on a % basis nor as a basis for if the stock of the company reacts)

dchu17•1h ago

Yes we know of a few. Honestly, it was pretty hard to even find a good catalyst calendar for this space.

I'll give it a read to learn more. Thanks for the note!

genes_unknown_1•1h ago

I used to work at a private investment fund as a data engineer for building in house models to evaluate drug programs and biotech companies. We took a pretty varied approach with catalysts, investment data, people data, trial data, but also analyses on the molecule and drug itself. It was a lot of work and I really don't think we made a dent into understanding what succeeds and what doesnt. Also investors in biotech are really underwriting the biology. Its why they mostly invest in fast follow or me toos rather than new technology or new therapies. The work was a bit sad and less exciting than I thought.

dchu17•1h ago

That's interesting. I am curious, what kind of analyses did you work with on the molecule and drug itself? Was it like mostly reading papers/patents or did your team do anything experimental?

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Show HN: I spent 4 years building a UI design tool with only the features I use

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

Show HN: Slack CLI for Agents

Show HN: Smooth CLI – Token-efficient browser for AI agents

Show HN: Horizons – OSS agent execution engine

Show HN: If you lose your memory, how to regain access to your computer?

Show HN: Gigacode – Use OpenCode's UI with Claude Code/Codex/Amp

Show HN: Artifact Keeper – Open-Source Artifactory/Nexus Alternative in Rust

Show HN: Falcon's Eye (isometric NetHack) running in the browser via WebAssembly

Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements

Show HN: Daily-updated database of malicious browser extensions

Show HN: A password system with no database, no sync, and nothing to breach

Show HN: GitClaw – An AI assistant that runs in GitHub Actions

Show HN: An open-source system to fight wildfires with explosive-dispersed gel

Show HN: Micropolis/SimCity Clone in Emacs Lisp

Show HN: Agentism – Agentic Religion for Clawbots

Show HN: Local task classifier and dispatcher on RTX 3080

Show HN: BPU – Reliable ESP32 Serial Streaming with Cobs and CRC

Show HN: Disavow Generator – Open-source tool to defend against negative SEO

Show HN: Hibana – An Affine MPST Runtime for Rust

Show HN: Beam – Terminal Organizer for macOS

Show HN: We built a way to see if you know anyone in the Epstein files

Show HN: Agent Arena – Test How Manipulation-Proof Your AI Agent Is

Show HN: Total Recall – write-gated memory for Claude Code

Show HN: Hex-Fiend - mental math challenge

Show HN: Craftplan – I built my wife a production management tool for her bakery

Show HN: Hacker Backlinks – HN Stories Most Linked To By HN Comments

Show HN: Ghidra MCP Server – 110 tools for AI-assisted reverse engineering

Show HN: Safe-now.live – Ultra-light emergency info site (<10KB)

Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements

Comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Show HN: I spent 4 years building a UI design tool with only the features I use

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

Show HN: Slack CLI for Agents

Show HN: Smooth CLI – Token-efficient browser for AI agents

Show HN: Horizons – OSS agent execution engine

Show HN: If you lose your memory, how to regain access to your computer?

Show HN: Gigacode – Use OpenCode's UI with Claude Code/Codex/Amp

Show HN: Artifact Keeper – Open-Source Artifactory/Nexus Alternative in Rust

Show HN: Falcon's Eye (isometric NetHack) running in the browser via WebAssembly

Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements

Show HN: Daily-updated database of malicious browser extensions

Show HN: A password system with no database, no sync, and nothing to breach

Show HN: GitClaw – An AI assistant that runs in GitHub Actions

Show HN: An open-source system to fight wildfires with explosive-dispersed gel

Show HN: Micropolis/SimCity Clone in Emacs Lisp

Show HN: Agentism – Agentic Religion for Clawbots

Show HN: Local task classifier and dispatcher on RTX 3080

Show HN: BPU – Reliable ESP32 Serial Streaming with Cobs and CRC

Show HN: Disavow Generator – Open-source tool to defend against negative SEO

Show HN: Hibana – An Affine MPST Runtime for Rust

Show HN: Beam – Terminal Organizer for macOS

Show HN: We built a way to see if you know anyone in the Epstein files

Show HN: Agent Arena – Test How Manipulation-Proof Your AI Agent Is

Show HN: Total Recall – write-gated memory for Claude Code

Show HN: Hex-Fiend - mental math challenge

Show HN: Craftplan – I built my wife a production management tool for her bakery

Show HN: Hacker Backlinks – HN Stories Most Linked To By HN Comments

Show HN: Ghidra MCP Server – 110 tools for AI-assisted reverse engineering

Show HN: Safe-now.live – Ultra-light emergency info site (<10KB)