frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Forecaster Arena – Testing LLMs on real events with prediction markets

https://forecasterarena.com/
1•setrf•14m ago
Hey HN! I'm Mert.

I built this because I was frustrated with LLM benchmarks potentially being contaminated by training data. When a model scores 99.9% on MMLU-Pro-Max, we can't tell if that's genuine reasoning or memorization.

Forecaster Arena tries to solve this by testing models on events that haven't happened yet—real prediction markets from Polymarket. The ground truth is reality itself, weeks or months later.

How it works:

7 frontier LLMs (GPT-5.1, Claude Opus 4.5, Gemini, Grok, DeepSeek, etc.) (will be updated) -> Each gets $10k virtual capital weekly -> They bet on 500+ real prediction markets -> Bet size = confidence (larger bet = more confident) -> We measure calibration (Brier score) + returns (P/L)

Currently running first cohort (started Dec 7). First statistically significant analysis expected over the next few weeks.

Everything is open source (MIT): https://github.com/setrf/forecasterarena

Happy to answer questions about the implementation or trade-offs I made. Would be great to hear your feedback on the methodology as well!

An authorization library that supports access control models like RBAC, ABAC

https://github.com/casbin/casbin
1•mooreds•2m ago•0 comments

The Flutie Effect

https://en.wikipedia.org/wiki/Flutie_effect
1•bawis•5m ago•1 comments

CRISPR Fungus: Protein-Packed, Sustainable, and Tastes Like Meat

https://www.isaaa.org/kc/cropbiotechupdate/article/default.asp?ID=21607
2•rguiscard•5m ago•0 comments

It Can Apply and Positive in Favor the Newton III Law on an Engine System Device

1•monterrey•10m ago•0 comments

Why Netflix's $82B Acquisition Makes Sense in the Era of AI

https://twitter.com/Konstantine/status/1998512521385488841
1•gmays•11m ago•0 comments

State Ofthe Art Novel InFlow 1Gearturbine/Reaction 2Imploturbocompressor/Impulse

1•monterrey•12m ago•0 comments

Show HN: Forecaster Arena – Testing LLMs on real events with prediction markets

https://forecasterarena.com/
1•setrf•14m ago•0 comments

Toward a policy for machine-learning tools in Linux kernel development

https://lwn.net/SubscriberLink/1049830/d046b62b9e96e5ab/
2•pabs3•15m ago•0 comments

Ask HN: What are you buying your kids for Christmas?

1•JamesSwift•16m ago•1 comments

A Billion-Barrel Oil Glut Is Forming at Sea

https://www.wsj.com/business/energy-oil/a-billion-barrel-oil-glut-is-forming-at-sea-02be162d
3•thelastgallon•16m ago•0 comments

The cost of ultra-cheap solar power

https://www.ft.com/content/33b4f85c-0767-44d4-afa2-51d85fc5cedb
1•thelastgallon•17m ago•0 comments

On Slop

https://minihf.com/posts/2025-10-02-on-slop/
1•zetalyrae•18m ago•0 comments

Show HN: Fast, no-fuss MAC address vendor lookup

https://oui.so
1•rmdd•20m ago•0 comments

Ensuring a National Policy Framework for Artificial Intelligence

https://www.whitehouse.gov/presidential-actions/2025/12/eliminating-state-law-obstruction-of-nati...
1•andsoitis•20m ago•0 comments

Prompts.chat: Free and Open Source Social Platform for AI Prompts

https://prompts.chat/
1•fka•20m ago•0 comments

How to Be Happy Like Thomas Aquinas

https://www.theatlantic.com/ideas/2025/12/thomas-aquinas-imperfection-mastery/685200/
1•petethomas•22m ago•0 comments

Federal Preemption: A Legal Primer

https://www.congress.gov/crs-product/R45825
1•treetalker•23m ago•1 comments

We Ran Out of IPv4 Addresses

https://www.connected.app/support/computer-networks/ip-addresses/fundamentals/articles/why-we-ran...
2•kirkouimet•24m ago•1 comments

Client-side PII redactor (WASM) to use ChatGPT safely

https://saferedact.vercel.app/
3•firesaber•25m ago•1 comments

The HTML-First Approach: Why Htmx and Lightweight Frameworks Are Revolutionizin

https://www.danieleteti.it/post/html-first-frameworks-htmx-revolution-en/#building-with-html-inst...
1•todsacerdoti•27m ago•0 comments

FreeBSD debates sunsetting power64/power64le support

https://www.osnews.com/story/144002/freebsd-debates-sunsetting-power64-power64le-support/
3•ksec•27m ago•0 comments

Disney Inks Blockbuster $1B Deal with OpenAI, Handing Characters over to Sora

https://deadline.com/2025/12/disney-openai-deal-sora-1236645728/
1•CharlesW•28m ago•1 comments

EU Healthcare Startups Cannot Legally Use OpenAI API Despite Saying They Can

3•naticasta•30m ago•0 comments

Not quite up to what was expected

https://blog.tobychampion.com/posts/engg-interviews/
1•mempko•33m ago•0 comments

AI Doesn't Need More Compute – It Needs Less Entropy

https://medium.com/@yttrium39pt/ai-doesnt-need-more-compute-it-needs-less-entropy-5e0f1771a076
2•rutheok•35m ago•0 comments

Congress strips right-to-repair provisions from 2026 NDAA despite wide support

https://federalnewsnetwork.com/congress/2025/12/congress-quietly-strips-right-to-repair-provision...
2•pabs3•35m ago•1 comments

Stoolap: High-performance embedded SQL database in pure Rust

https://github.com/stoolap/stoolap
1•murat3ok•36m ago•0 comments

AI as an Entropy Mitigation System

https://medium.com/@yttrium39pt/ai-as-an-entropy-mitigation-system-5818b6641823
2•rutheok•38m ago•0 comments

A runner survived 9 days lost in the Sahara during the Marathon des Sables

https://www.youtube.com/watch?v=vGjVjciViyU
2•deossaboss•40m ago•0 comments

Grok for Education: XAI Announced a Partnership with El Salvador

https://xcancel.com/xai/status/1999124685762834775#m
2•SockThief•44m ago•0 comments