frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements

https://www.biotradingarena.com/hn
20•dchu17•5h ago
Hi HN,

My friend and I have been experimenting with using LLMs to reason about biotech stocks. Unlike many other sectors, Biotech trading is largely event-driven: FDA decisions, clinical trial readouts, safety updates, or changes in trial design can cause a stock to 3x in a single day (https://www.biotradingarena.com/cases/MDGL_2023-12-14_Resmet...).

Interpreting these ‘catalysts,’ which comes in the form of a press release, usually requires analysts with previous expertise in biology or medicine. A catalyst that sounds “positive” can still lead to a selloff if, for example: the effect size is weaker than expected

- results apply only to a narrow subgroup

- endpoints don’t meaningfully de-risk later phases,

- the readout doesn’t materially change approval odds.

To explore this, we built BioTradingArena, a benchmark for evaluating how well LLMs can interpret biotech catalysts and predict stock reactions. Given only the catalyst and the information available before the date of the press release (trial design, prior data, PubMed articles, and market expectations), the benchmark tests to see how accurate the model is at predicting the stock movement for when the catalyst is released.

The benchmark currently includes 317 historical catalysts. We also created subsets for specific indications (with the largest in Oncology) as different indications often have different patterns. We plan to add more catalysts to the public dataset over the next few weeks. The dataset spans companies of different sizes and creates an adjusted score, since large-cap biotech tends to exhibit much lower volatility than small and mid-cap names.

Each row of data includes:

- Real historical biotech catalysts (Phase 1–3 readouts, FDA actions, etc.) and pricing data from the day before, and the day of the catalyst

- Linked Clinical Trial data, and PubMed pdfs

Note, there are may exist some fairly obvious problems with our approach. First, many clinical trial press releases are likely already included in the LLMs’ pretraining data. While we try to reduce this by ‘de-identifying each press release’, and providing only the data available to the LLM up to the date of the catalyst, there are obviously some uncertainties about whether this is sufficient.

We’ve been using this benchmark to test prompting strategies and model families. Results so far are mixed but interesting as the most reliable approach we found was to use LLMs to quantify qualitative features and then a linear regression of these features, rather than direct price prediction.

Just wanted to share this with HN. I built a playground link for those of you who would like to play around with it in a sandbox. Would love to hear some ideas and hope people can play around with this!

Comments

austinwang115•3h ago
Interesting, biotech stocks have been notoriously hard to predict because their business model revolves around science, and it’s hard to know when the science is right. Depending on the situation, I think sentiment could potentially be a misleading/confounding variable here…
observationist•1h ago
Sentiment is crucial - if you know sentiment is incorrectly oriented, you can capitalize on it. If you know it's correct, you can identify mispricing, and strategize accordingly.
worik•1h ago
Why do you think that LLMs would do any better than monkeys throwing darts?

I am raining on your parade but this is another in a long succession of ways to loose money.

The publicly available information in markets is priced very efficiently, us computer types do not like that and we like to think that our pattern analysis machines can do better than a room full of traders. They cannot.

The money to be made in markets is from private information and that is a crime (insider trading), is widespread, and any system like this is fighting it and will loose.

sjkoelle•37m ago
efficiency is not a given. also this is an eval set - they acknowledge the challenge themselves.

imho this is v cool

dchu17•33m ago
Our initial goal with this project actually wasn't trying to get an edge in terms of better evaluating information, but rather, we wanted to see if an LLM can perform similarly to a human analyst at a lower latency. The latency for the market to react to catalysts is actually surprisingly high in biotech (at least in some cases) compared to other domains so there may be some edge there.

Appreciate the comment though! I generally agree with your sentiment!

bjconlan•6m ago
I used to work for a human that did this (sits mostly on the classical therapeutics side). He actually started a business where he was reviewing and auditing the submission processes outlining approvals but he had been around the game enough to know where the next submission would put them in the approvals process for a number of agencies.

https://maestrodatabase.com/

Looks like he's still on top of everything given the most recent blog post is from 6/2/2026.

I believe the insights here could be useful given he has sense of when the penultimate submission has occured (but I'm not entirely sure what that is on a % basis nor as a basis for if the stock of the company reacts)

genes_unknown_1•5m ago
I used to work at a private investment fund as a data engineer for building in house models to evaluate drug programs and biotech companies. We took a pretty varied approach with catalysts, investment data, people data, trial data, but also analyses on the molecule and drug itself. It was a lot of work and I really don't think we made a dent into understanding what succeeds and what doesnt. Also investors in biotech are really underwriting the biology. Its why they mostly invest in fast follow or me toos rather than new technology or new therapies. The work was a bit sad and less exciting than I thought.
dchu17•1m ago
That's interesting. I am curious, what kind of analyses did you work with on the molecule and drug itself? Was it like mostly reading papers/patents or did your team do anything experimental?

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
91•klaussilveira•57m ago•10 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
582•xnx•6h ago•375 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
164•vecti•3h ago•72 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
30•isitcontent•1h ago•6 comments

Tell HN: I'm a PM at a big system of record SaaS. We're cooked

51•throwawaypm123•2h ago•2 comments

Microsoft open-sources LiteBox, a security-focused library OS

https://github.com/microsoft/litebox
252•aktau•7h ago•130 comments

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

https://github.com/phreda4/r3
9•phreda4•38m ago•0 comments

Sheldon Brown's Bicycle Technical Info

https://www.sheldonbrown.com/
241•ostacke•7h ago•59 comments

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

https://infisical.com/blog/devops-to-solutions-engineering
54•vmatsiiako•6h ago•15 comments

Learning from context is harder than we thought

https://hy.tencent.com/research/100025?langVersion=en
96•limoce•3d ago•43 comments

Claude Composer

https://www.josh.ing/blog/claude-composer
57•coloneltcb•2d ago•32 comments

Understanding Neural Network, Visually

https://visualrambling.space/neural-network/
189•surprisetalk•3d ago•24 comments

How virtual textures work

https://www.shlom.dev/articles/how-virtual-textures-really-work/
8•betamark•8h ago•0 comments

Early Christian Writings

https://earlychristianwritings.com/
27•dsego•48m ago•1 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
7•dmpetrov•1h ago•3 comments

An Update on Heroku

https://www.heroku.com/blog/an-update-on-heroku/
216•lstoll•7h ago•166 comments

How to effectively write quality code with AI

https://heidenstedt.org/posts/2026/how-to-effectively-write-quality-code-with-ai/
105•i5heu•3h ago•83 comments

I now assume that all ads on Apple news are scams

https://kirkville.com/i-now-assume-that-all-ads-on-apple-news-are-scams/
875•cdrnsf•10h ago•380 comments

Show HN: Smooth CLI – Token-efficient browser for AI agents

https://docs.smooth.sh/cli/overview
62•antves•1d ago•49 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
84•eljojo•3h ago•82 comments

Evaluating and mitigating the growing risk of LLM-discovered 0-days

https://red.anthropic.com/2026/zero-days/
7•lebovic•1d ago•1 comments

Show HN: Slack CLI for Agents

https://github.com/stablyai/agent-slack
7•nwparker•1d ago•3 comments

Show HN: Horizons – OSS agent execution engine

https://github.com/synth-laboratories/Horizons
10•JoshPurtell•21h ago•3 comments

Masked namespace vulnerability in Temporal

https://depthfirst.com/post/the-masked-namespace-vulnerability-in-temporal-cve-2025-14986
22•bmit•2h ago•2 comments

The mystery of the mole playing rough (2019) [video]

https://www.youtube.com/watch?v=nwQmwT1ULMU
4•archagon•15h ago•0 comments

Show HN: Gigacode – Use OpenCode's UI with Claude Code/Codex/Amp

https://github.com/rivet-dev/sandbox-agent/tree/main/gigacode
5•NathanFlurry•9h ago•4 comments

FORTH? Really!?

https://rescrv.net/w/2026/02/06/associative
5•rescrv•8h ago•1 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
317•todsacerdoti•8h ago•179 comments

Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements

https://www.biotradingarena.com/hn
20•dchu17•5h ago•8 comments

The Beauty of Slag

https://mag.uchicago.edu/science-medicine/beauty-slag
4•sohkamyung•3d ago•0 comments