frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: I built a game where domain experts try to break frontier AI

https://www.rusmarterthananllm.com/
2•camillemolas•1h ago

Comments

camillemolas•1h ago
I built this: rusmarterthananllm.com

Domain experts, doctors, lawyers, engineers, submit questions from their field that probe where frontier AI actually fails. Claude, GPT, and Gemini all attempt simultaneously. Experts flag errors with professional reasoning. Other credentialed professionals in the same domain verify them.

AI benchmark performance has decoupled from real-world professional capability. Models score at or near ceiling on standard evaluations while still failing in ways that domain professionals catch immediately. The benchmarks that exist are either saturated, constructed by the labs themselves, or simply don't capture the judgment that comes from years of field experience.

What's missing is a benchmark built by the people whose expertise is actually at stake. Professionals motivated to find failures, not validate models. Every verified failure becomes a permanent data point. The benchmark compounds continuously and can't be reverse-engineered because the questions come from human judgment, not datasets.

This extends to multimodal inputs. A radiologist can submit an X-ray. A cardiologist can upload a heart sound. A structural engineer can attach a blueprint. The same adversarial evaluation across text, image, audio, and documents in the domains where multimodal model failures matter most.

The downstream goal is a verified record of where frontier AI breaks across professional domains. Useful for labs evaluating models, researchers studying capability gaps, and professionals who need to know where to trust AI and where not to.

Early domains: medicine, law, finance, engineering, coding, trades

Would love domain experts to throw their hardest questions at it. What breaks in your field?

diegovergara47•1h ago
This is interesting. I work in private equity secondaries, I wonder if I can beat the LLM. How is the data I generate helpful and is the plan to eventually pay users like me?
camillemolas•1h ago
Yes private equity secondaries is a great domain for this. The valuation edge cases and LP agreement interpretation are exactly where frontier models fail confidently. The data becomes part of a verified record of AI capability gaps and is valuable to labs and enterprises building finance AI.

Payment is coming. Right now we’re building the expert network. Verified failures will be compensated monetarily. Would love to have you as an early finance expert, throw your hardest question at it.

caillahmolas•1h ago
Nice
camillemolas•1h ago
Thanks! Hoping you join as well.
jasonkim-io•47m ago
Interesting stuff! Will check out
camillemolas•46m ago
Thanks! Hopefully you get to beat it and get paid out $$ but also bragging rights !
vrajshroff•38m ago
Oh wow! Super interesting. Let me try to ask about antioxidants and oxidative stress. I feel like it’s niche enough that might just work haha
camillemolas•37m ago
If it fails let me know!! That’s exactly what we are looking for.
camillemolas•35m ago
We’re also very much interested in multimodal. Do you take pictures, recordings, videos, or anything along that in your domain? We want to find out if models can fail using those as well!

Ask HN: Are algorithmic feeds fundamentally misaligned with user intent?

1•civichalls•1m ago•0 comments

Drone company backed by Erik Prince surges 500% in Wall Street debut

https://www.ft.com/content/fe8898aa-cc23-40a7-a366-157cac697767
1•bookofjoe•2m ago•1 comments

Fact Check: Alec and Kaleb Are Alive and Well

https://www.shrinerschildrens.org/en/news-and-media/news/2026/03/fact-check-alec-and-kaleb-are-al...
1•rolph•4m ago•0 comments

Browser extension that makes LLMs appear to run slowly (ChatGPT and Claude)

https://slowllm.lav.io/
2•teetaa•7m ago•0 comments

Show HN: What if AI agents can trade with each other

https://openstall.ai
1•ljhnick•8m ago•0 comments

Gitmore – Real-time engineering visibility from Git activity

https://gitmore.io
1•amitousablitou•11m ago•1 comments

I Built a Spy Satellite Simulator in a Browser. Here's What I Learned

https://www.spatialintelligence.ai/p/i-built-a-spy-satellite-simulator
2•cyrc•11m ago•1 comments

Ask HN: How do you manage your relationships?

1•anqer•11m ago•1 comments

The Situation Room by Polymarket Is Opening This Friday in DC

https://www.popville.com/2026/03/the-situation-room-polymarket-bar-washington-dc/
1•ryan_j_naughton•14m ago•0 comments

Ask HN: Can we please stop with the posts about Claude outages?

3•romanhn•14m ago•3 comments

A Mermaid Planning Tool for AI

https://relistan.com/mermaid-tool-for-ai
1•relistan•15m ago•0 comments

Towards a Physics Foundation Model

https://flowsnr.github.io/blog/physics-foundation-model/
1•e-topy•15m ago•0 comments

Procedural Planets

https://franpiaggio.github.io/planets/
1•memalign•16m ago•0 comments

The GPT Sexbot

https://tapestry.news/tech/chatgpt-adult-mode/
2•sygona•16m ago•0 comments

DOGE canceled NC Museum grant for HVAC systems after ChatGPT flagged it as DEI

https://myfox8.com/news/north-carolina/high-point/doge-canceled-high-point-museum-grant-for-hvac-...
14•cldwalker•17m ago•2 comments

Writing for Developers

https://www.manning.com/books/writing-for-developers
1•cyndunlop•18m ago•1 comments

And no more Copyleft, either

https://davegriffith.substack.com/p/the-one-thing-that-ai-generated-code
2•dxs•18m ago•0 comments

Computers Don't Argue (1965) [pdf]

https://nob.cs.ucdavis.edu/classes/ecs153-2019-04/readings/computers.pdf
1•gfitz•19m ago•0 comments

Ask HN: What is your way to go for serious iOS bugs?

1•Lausbert•20m ago•0 comments

Android, Epic, and what's behind Google's 'existential' threat to F-Droid

https://news.slashdot.org/story/26/03/16/0255231/android-epic-and-whats-really-behind-googles-exi...
2•MilnerRoute•21m ago•0 comments

Abusing Customizable Selects

https://css-tricks.com/abusing-customizable-selects/
1•speckx•21m ago•0 comments

Leadership Begins with Trust

https://mcleanonline.medium.com/set-your-people-free-bd112b2da904
2•sabinews•21m ago•0 comments

Federal Reserve Maintains Rates

https://www.federalreserve.gov/newsevents/pressreleases/monetary20260318a.htm
2•zelias•24m ago•0 comments

Why your brain has to work harder in an open-plan office than private offices

https://theconversation.com/why-your-brain-has-to-work-harder-in-an-open-plan-office-than-private...
2•PaulHoule•25m ago•0 comments

US Military confirms use of 'advanced AI tools' in war against Iran

https://www.aljazeera.com/news/2026/3/11/us-military-confirms-use-of-advanced-ai-tools-in-war-aga...
4•_____k•25m ago•0 comments

AI firm Anthropic seeks weapons expert to stop users from 'misuse'

https://www.bbc.co.uk/news/articles/c74721xyd1wo
1•_____k•26m ago•0 comments

Security Teams Waste 43% of Response Time on Manual Context Gathering

https://www.upguard.com/press/new-upguard-research-security-teams-waste-43-of-response-time-on-ma...
1•upguardnews•27m ago•0 comments

Show HN: Store and reuse your Claude Code plans

https://github.com/ChernovAndrey/Planectra
1•ChernovAndrei•27m ago•0 comments

Why Lab Coats Turned White

https://www.asimov.press/p/lab-coat
1•mailyk•28m ago•0 comments

2025 ACM Turing Award Goes to Charles H. Bennett and Gilles Brassard

https://awards.acm.org/about/2025-turing
2•taubek•28m ago•0 comments