frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: AIIR: track AI-assisted Git commits with cryptographic receipts

https://github.com/invariant-systems-ai/aiir
1•nerlwein•1m ago•0 comments

Ramtrack.eu – RAM Price Intelligence

https://ramtrack.eu
1•nu11r0ut3•2m ago•0 comments

Uber investing up to $1.25B in Rivian

https://finance.yahoo.com/news/uber-investing-up-to-125-billion-in-rivian-eyes-50000-car-expansio...
1•jbredeche•2m ago•0 comments

I built a game where you argue consumer rights against AI bots

1•dragonmann•2m ago•0 comments

Show HN: Lantern – See which deploy slowed your Postgres queries ($9/mo)

https://uselantern.dev/
1•em_builds•3m ago•0 comments

The new security frontier for LLMs; SIEM evasion

https://blog.vulnetic.ai/the-new-security-frontier-for-llms-siem-evasion-488e8f3c8d7d
1•samuelknight•5m ago•0 comments

Show HN: Wet Claude – Go proxy that lets CC to profile and optimize its context

https://github.com/buildoak/wet
2•buildoak•9m ago•1 comments

Tree veteranisation – using tools instead of time

https://www.researchgate.net/publication/344336914_Tree_veteranisation_-_using_tools_instead_of_time
1•ostacke•10m ago•0 comments

Like the information in a dream: IBM's Charles Bennett receives ACM Turing Award

https://research.ibm.com/blog/2025-turing-award
1•Lutzb•11m ago•0 comments

Top AI models underperform in languages other than English

https://www.economist.com/science-and-technology/2026/03/18/top-ai-models-underperform-in-languag...
2•Brajeshwar•13m ago•0 comments

Qwen-ASR-CLI – local Qwen ASR CLI written in pure Rust

https://github.com/huanglizhuo/QwenASR
1•huang4fun•13m ago•0 comments

Skillfile, the declarative skill manager, now auto-discovers skills in repos

https://github.com/eljulians/skillfile
1•_juli_•13m ago•0 comments

Forcing Claude Code to Run Outside-In TDD

https://www.joegaebel.com/articles/principled-agentic-software-development/
1•joegaebel•13m ago•0 comments

Gamers Nexus: Nvidia Says You're "Completely Wrong" About DLSS 5 Being Slop [video]

https://www.youtube.com/watch?v=H1a8YEOlpeY
1•szmarczak•14m ago•0 comments

VerdictAI – 7 LLMs deliberate your question, 1 chairman synthesizes

https://VerdictAI.net
1•rogerblee•14m ago•0 comments

The strait of Hormuz blockade will strangle US defense industry

https://www.theguardian.com/world/2026/mar/19/west-point-analysis-iran-war-costs
11•mitchbob•15m ago•2 comments

An FAQ on Reinforcement Learning Environments

https://epoch.ai/gradient-updates/state-of-rl-envs
2•dcre•17m ago•0 comments

Autoresearching Apple's "LLM in a Flash" to run Qwen 397B locally with low RAM

https://simonwillison.net/2026/Mar/18/llm-in-a-flash/
1•walterbell•17m ago•0 comments

Capacity Is the Roadmap

https://yusufaytas.com/capacity-is-the-roadmap/
1•djurgardensif•17m ago•0 comments

Node9 – A "sudo" wrapper for AI agents with auto Git snapshots

https://github.com/node9-ai/node9-proxy
1•nadav_tal•18m ago•1 comments

A Feral Guide to Marketing

https://aella.substack.com/p/a-feral-guide-to-marketing
1•jger15•19m ago•0 comments

Scientists Once Dropped Gophers on a Volcano. Now They're Heroes

https://www.popularmechanics.com/science/environment/a70780384/gophers-mount-st-helens-eruption/
1•ColinWright•21m ago•0 comments

I built LionPost – send greeting cards with your own photos and voice messages

https://lionpost.com/
2•ibaird•24m ago•0 comments

Search Referral Traffic Down 60% for Small Publishers, Data Shows

https://www.searchenginejournal.com/search-referral-traffic-down-60-for-small-publishers-data-sho...
2•thm•25m ago•0 comments

Show HN: ShadowStrike EDR/XDR Kernel Sensor Development

1•Soocile•25m ago•0 comments

Reducing the size of Go binaries by up to 77%

https://www.datadoghq.com/blog/engineering/agent-go-binaries/
3•PaulHoule•26m ago•0 comments

Luxembourg High Administrative Court annulled a €746M fine against Amazon

https://gdprhub.eu/index.php?title=CA_Luxembourg_-_52757C&mtc=today
2•speckx•26m ago•1 comments

Our new Windsurf pricing plans

https://windsurf.com/blog/windsurf-pricing-plans
2•mxfh•28m ago•0 comments

Ask HN: AWS account restricted 18h despite remediation (Case 177385077300217)

3•trollderiu•28m ago•0 comments

Afroman Wins Civil Trial over Use of Police Raid Footage in His Music Videos

https://www.nytimes.com/2026/03/19/us/afroman-trial-lemon-cake-verdict.html
3•pseudolus•28m ago•1 comments
Open in hackernews

Show HN: LLMadness – March Madness Model Evals

https://llmadness.com/2026/
2•rjkeck2•1h ago
I wanted to play around with the non-coding agentic capabilities of the top LLMs so I built a model eval predicting the March Madness bracket.

After playing around a bit with the format, I went with the following setup:

- 63 single-game predictions v. full one-shot bracket

- Maxed out at 10 tool calls per game

- Upset-specific instruction in the system prompt

- Exponential scoring by round (1, 2, 4, 8, 16, 32)

There were some interesting learnings:

- Unsurprisingly, most brackets are close to chalk. Very few significant upsets were predicted.

- There was a HUGE cost and token disparity with the exact same setup and constraints. Both Claude models spent over $40 to fill in the bracket while MiMo-V2-Flash spent $0.39. I spent a total of $138.69 on all 15 model runs.

- There was also a big disparity in speed. Claude Opus 4.6 took almost 2 full days to finish the 2 play-ins and 63 bracket games. Qwen 3.5 Flash took under 10 minutes.

- Even when given the tournament year (2026), multiple models pulled in information from previous years. Claude seemed to be the biggest offender, really wanting Cooper Flagg to be on this year's Duke team.

This was a really fun way to combine two of my interests and I'm excited to see how the models perform over the coming weeks. You can click into each bracket node to see the full model trace and rationale behind the picks.

The stack is Typescript, Next.js, React, and raw CSS. No DB, everything stored in static JSON files. After each game, I update the actual results and re-deploy via GitHub Pages.

I wanted to work as fast as possible since the brackets lock today so almost all of the code was AI-generated (shocker).

Hope you enjoy checking it out!