news newest ask show jobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

We're running out of benchmarks to upper bound AI capabilities

https://www.lesswrong.com/posts/gfkJp8Mr9sBm83Rcz/we-re-actually-running-out-of-benchmarks-to-upper-bound-ai

14•gmays•2h ago

Comments

WarmWash•1h ago

Start front loading the models with 5k, 10k, 50k, 100k tokens of messy quasi related context, and then run the benchmarks.

These models are ridiculously powerful with a blank slate. It's when they get loaded down with all the necessary (and inevitably unnecessary) context to complete the task that they really start to crumble and fold.

jballanc•35m ago

We need benchmarks that can distinguish between continuous learning and long-context extrapolation.

nikisweeting•55m ago

We can definitely make harder evals, the problem is a good eval set is indistinguishable from good training data / market edge, so no one is incentivized to share their best eval sets publicly.

UltraSane•1m ago

This is the least true thing ever. All LLMs are terrible at ARC-AGI-3

Beware, fellow plutocrats, the pitchforks are coming [video]

https://www.youtube.com/watch?v=q2gO4DKVpa8

1•jyounker•20s ago•1 comments

Under the hood of MDN's new front end

https://developer.mozilla.org/en-US/blog/mdn-front-end-deep-dive/

1•0xedb•52s ago•0 comments

Honda's EV Reversal Just Killed Sony's Electric Car: TDS

https://www.thedrive.com/news/hondas-ev-reversal-just-killed-sonys-electric-car-tds

1•PaulHoule•2m ago•0 comments

Poll: Majority of voters say risks of AI outweigh the benefits

https://www.nbcnews.com/politics/politics-news/poll-majority-voters-say-risks-ai-outweigh-benefit...

1•cdrnsf•2m ago•0 comments

A public Agent Sandbox with Hermes inside

https://sandbox-sba1ad15f841c32f2f.treadstone-ai.dev/

1•earayu•5m ago•0 comments

A New Case Exposed the Clever Workaround the FBI Uses to Read Secure Messages

https://www.inc.com/chloe-aiello/a-new-case-exposed-the-clever-workaround-the-fbi-uses-to-read-se...

1•daft_pink•6m ago•0 comments

Blockchain.com bug causing wrong data to be displayed

https://www.blockchain.com/explorer/addresses/btc/1PWo3JeB9jrGwfHDNpdGK54CRas7fsVzXU

1•867-5309•7m ago•2 comments

A Communist Apple II and Fourteen Years of Not Knowing What You're Testing

https://llama.gs/blog/index.php/2026/04/10/friday-archaeology-a-communist-apple-ii-and-fourteen-y...

1•major4x•8m ago•0 comments

Password Manager Angst

https://www.tbray.org/ongoing/When/202x/2026/04/09/Password-Manager-Angst

1•timbray•8m ago•0 comments

Sigbovik 2026

https://sigbovik.org/2026/

1•blmayer•9m ago•1 comments

NASA's Artemis II Crew Comes Home [video]

https://www.youtube.com/watch?v=nfhDuOHMp0A

1•meetpateltech•11m ago•0 comments

AI and Cybersecurity: A Glass Half-Empty/Half-Full of Nitroglycerin

https://www.techdirt.com/2026/04/10/ai-and-cybersecurity-a-glass-half-empty-half-full-proposition...

1•hn_acker•13m ago•1 comments

Show HN: A 1KB zero-dependency relative time formatter for UI systems

https://appents.com/tech/human-time

1•hedayet•13m ago•0 comments

In-Place Test-Time Training

https://arxiv.org/abs/2604.06169

1•dgfl•13m ago•1 comments

How Many Lives Do Amber Alerts Save?

https://www.mcgill.ca/oss/article/critical-thinking-technology-history/how-many-lives-do-amber-al...

2•jprs•14m ago•0 comments

The Ancient Coding Language That 95% of ATMs Use [video]

https://www.youtube.com/watch?v=P8oc_UXgD2A

3•kerim-ca•16m ago•0 comments

Linux/Mac Alternative to Nvidia Broadcast?

https://github.com/Hkshoonya/nvidia-broadcast-linux

1•burgerguyg•16m ago•1 comments

The Great Majority: Body Snatching and Burial Reform in 19th-Century Britain

https://publicdomainreview.org/essay/the-great-majority/

1•apollinaire•21m ago•0 comments

Association of high-pillow sleeping posture w intraocular pressure in glaucoma

https://bjo.bmj.com/content/early/2026/01/22/bjo-2025-328037

2•bookofjoe•22m ago•0 comments

Building Pi in a World of Slop – Mario Zechner

https://www.youtube.com/watch?v=_zdroS0Hc74

1•swyx•24m ago•0 comments

I built an AI that teaches your slides and answers questions

https://lectura.biz

1•keniz•24m ago•0 comments

I Tried Giving an AI Agent Gmail Access. It Took 19 Steps and Still Failed

https://www.anson.im/blog/why-i-built-agentsmail/

1•beacharsiu•26m ago•0 comments

Filing the Corners Off MacBooks

https://kentwalters.com/posts/corners/

16•normanvalentine•27m ago•2 comments

Chess-Bench

https://www.chess-bench.com/

2•hardikvora•28m ago•1 comments

What's Going on in AI

https://elicit.com/blog/situational-awareness-april-2026

3•sethbannon•29m ago•0 comments

Suez Crisis

https://en.wikipedia.org/wiki/Suez_Crisis

3•softwaredoug•29m ago•0 comments

"It's rare that developers love their database that much"

https://talkingpostgres.com/episodes/how-i-went-from-oracle-to-postgres-with-a-big-nosql-detour-w...

3•clairegiordano•29m ago•0 comments

Show HN: YouTube Live Spam Blocker

https://github.com/pnhoang/youtube-spam-blocker

2•pnhoang•32m ago•0 comments

Revisiting and Optimising go-iso8601-duration

https://xnacly.me/posts/2026/revisiting-and-optimising-goiso8601duration/

2•ingve•33m ago•0 comments

Framework [Next Gen] Event is live on April 21

https://frame.work/gb/en/blog/framework-next-gen-event-is-live-on-april-21

2•ezequiel-garzon•35m ago•0 comments