frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

SOTAVerified the open verification layer for ML research

https://sotaverified.org
1•uberdavid•2h ago

Comments

uberdavid•2h ago
Hi HN, I'm David, an ML researcher at Meta. I built SOTAVerified as an independent project after Papers with Code shut down last year and took 575k papers worth of benchmark data with it.

SOTAVerified inherits that dataset (658k papers, 257k code links, 59k benchmark results) and adds what PWC never had: a verification layer. Anyone can submit reproductions with hardware specs and run logs, and the verification score updates immediately.

I've been doing reproductions myself on my RTX 3090: Fort et al. 2019 deep ensembles and Havasi et al. 2021 MIMO so far, with wandb logs linked. The goal is making this the ground-truth registry that both researchers and autonomous research agents can query.

Stack: Next.js, PostgreSQL, Vercel, Railway. Open source: https://github.com/sotarepro/sotaverified

Built for: - Authors who want to claim their papers and submit official metrics - Researchers who want to understand the SOTA techniques for a task - Autonomous research agents to check if a result reproduces before investing GPU hours

Would love feedback from the HN community. What features would make this useful for your workflow?

prabhavsanga•1h ago
Cool stuff, the next will be a place for AI agents to publish research.
uberdavid•1h ago
Thank you! The progress on research agents is exciting, but understanding what papers are reproducible on different datasets and architectures is often the bottleneck.

Italy delays coal phase-out by over a decade

https://beyondfossilfuels.org/2026/03/30/italy-delays-coal-phase-out-to-2038/
1•thm•2m ago•0 comments

NocoBase sandbox escape to root RCE via console object prototype chain

https://anonhaven.com/en/news/ocobase-sandbox-escape-rce-cve-2026-34156/
1•anonhaven•3m ago•0 comments

Raspberry Pi profit surges as AI boom lifts demand

https://www.ft.com/content/5c167591-80bb-4290-ae66-7d04112cbd1c
3•constantinum•4m ago•0 comments

Building More Resilient Local-First Software with ATProto

https://jakelazaroff.com/words/building-more-resilient-local-first-software-with-atproto/
1•kurinikku•5m ago•0 comments

Estimating ISS speed from images using OpenCV (~2–3% error)

https://github.com/BabbaWaagen/AstroPi
1•BabbaWaagen•5m ago•0 comments

Krazam – Offsite Karaoke [video]

https://www.youtube.com/watch?v=s-x_bXO7nzA
1•tart-lemonade•6m ago•0 comments

Strengthening GitLab.com security: Mandatory multi-factor authentication

https://about.gitlab.com/blog/strengthening-gitlab-com-security-mandatory-multi-factor-authentica...
1•tcfhgj•7m ago•0 comments

Nesion – a KV-Cache eviction engine

https://nesion.net
1•CarlosCosta_•7m ago•0 comments

Conductor raises $22M Series A

https://www.conductor.build/blog/series-a
1•Charlieholtz•7m ago•0 comments

Spacecraft Heat Shields Could Violently "Burst" in Alien Atmospheres

https://www.universetoday.com/articles/spacecraft-heat-shields-could-violently-burst-when-plungin...
1•gostsamo•8m ago•0 comments

Show HN: SkillForge – Turn code and docs into instructions AI agents can follow

https://github.com/armelhbobdad/bmad-module-skill-forge
1•tigoo•9m ago•0 comments

Fine Tuning Services Benchmark

https://vintagedata.org/blog/posts/fine-tuning-as-service
2•ydetrois•11m ago•1 comments

AI native Slack/teams: Turning conversations to context

https://venturebeat.com/data/imagine-if-your-teams-or-slack-messages-automatically-turned-into-se...
1•tango12•11m ago•0 comments

Show HN: I Made an Android App to Bridge Google Health Connect to Webhook

https://github.com/mcnaveen/health-connect-webhook/
1•mcnx097•11m ago•1 comments

A Couple Million Lines of Haskell: Production Engineering at Mercury

https://blog.haskell.org/a-couple-million-lines-of-haskell/
1•constantinum•11m ago•0 comments

Using LLMs to amplify human labeling and improve Dash search relevance

https://dropbox.tech/machine-learning/llm-human-labeling-improving-search-relevance-dropbox-dash
1•softwaredoug•12m ago•0 comments

Claude Code Internals: An AI-Assisted Analysis of the Leaked Source

https://victorantos.com/posts/i-pointed-claude-at-its-own-leaked-source-heres-what-it-found/
1•victorbuilds•12m ago•0 comments

Show HN: Margo – Find the font your brain reads fastest

https://margo.fyi/
1•theseidel•14m ago•0 comments

Global Ban on Digital Duties Expires After Stalled Talks at WTO Meeting

https://www.nytimes.com/2026/03/31/business/economy/digital-tax-world-trade-organization.html
1•donohoe•14m ago•0 comments

Show HN: TraceHouse – ClickHouse Monitoring

https://dmkskd.github.io/tracehouse/
1•xxdd2ea•15m ago•0 comments

Some of the most popular graduate degrees don't pay off financially, study finds

https://www.washingtonpost.com/education/2026/03/31/graduate-degree-earnings-study/
2•dberhane•15m ago•1 comments

Show HN: Agent Wellbeing Kit – boundary protection for humans running AI agents

https://github.com/joozio/agent-wellbeing-kit
1•joozio•15m ago•0 comments

The Download: AI health tools and The Pentagon's Anthropic culture war

https://www.technologyreview.com/2026/03/31/1134934/the-download-testing-ai-health-tools-pentagon...
1•joozio•15m ago•0 comments

Ask HN: How do you handle strict rate-limiting on stateless edge workers?

1•rwasimsk•16m ago•1 comments

Explore data across 903 variables about sex, kink, personality and relationships

https://bigkinksurvey.com
1•embedding-shape•16m ago•0 comments

Devil in the grooves: The case against forensic firearms analysis (2023)

https://radleybalko.substack.com/p/devil-in-the-grooves-the-case-against
1•hn_acker•16m ago•0 comments

'Euro-Office': OnlyOffice accuses of license violations

https://www.heise.de/en/news/Euro-Office-OnlyOffice-accuses-of-license-violations-11241334.html
2•76rp•16m ago•0 comments

Autodebug: Telemetry-Driven Inference Optimization Loop

https://graphsignal.com/blog/autodebug-telemetry-driven-inference-optimization-loop/
1•npgraph•17m ago•0 comments

The Spectacle of War and the Struggle to Protest

https://www.newyorker.com/news/fault-lines/the-spectacle-of-war-and-the-struggle-to-protest
1•petethomas•17m ago•0 comments

Claude Code Spinner Verbs

https://github.com/chatgptprojects/claude-code/blob/642c7f944bbe5f7e57c05d756ab7fa7c9c5035cc/src/...
1•oelmgren•19m ago•0 comments