news newest ask show jobs

Open Source @Github

fp.

Open in hackernews

Benchmark accuracy retention is the wrong metric

https://www.joshuahedtke.com/writing/benchmark-retention-is-not-utility-retention

1•fmaccomber•1h ago

Comments

fmaccomber•1h ago

Whether model routing works is an empirical problem. Existing empirical efforts rely on benchmark accuracy retention, i.e. how does a model routing system score compared to a sophisticated model like Opus 4.7 on a complex task benchmark like Terminal-Bench 2.0.

However, that metric is completely divorced from what we care about. The better metric is utility retention, which takes into account task importance.

Compass Finance

https://compassfinance.pro/

1•alibundally•13s ago•0 comments

We need to learn how to argue with AI

https://www.ft.com/content/d2d8f531-2833-4edc-9107-7bb73d9f0c4b

1•uxhacker•1m ago•0 comments

Massachusetts bans sale of precise location data in new privacy rights bill

https://techcrunch.com/2026/06/08/massachusetts-votes-to-pass-new-privacy-rights-bill-that-bans-s...

1•01-_-•1m ago•0 comments

Passenger used suspected fake boarding pass to sneak onto United flight

https://www.cnn.com/2026/06/07/us/unauthorized-man-on-united-flight

1•reconnecting•2m ago•0 comments

Rewilding the Web: my workshop report from Edinburgh

https://anil.recoil.org/notes/rewilding-the-web-report

1•hn_acker•2m ago•0 comments

New Referendum Would Flip Brexit Result 10 Years On, Poll Finds

https://www.bloomberg.com/news/articles/2026-06-08/new-referendum-would-flip-brexit-result-10-yea...

2•MilnerRoute•2m ago•0 comments

Grassware

https://heatherburns.tech/2026/06/07/grassware/

1•hn_acker•2m ago•0 comments

Ask HN: How to escalate a rejected Google extension?

1•modzu•4m ago•0 comments

GLP-1 Drugs: Things We've Learned About Their Effects

https://www.nytimes.com/2026/06/08/well/glp1-drugs-weight-loss.html

1•adamgordonbell•4m ago•0 comments

Co-Creator of Haskell: Functional Prog., Thinking in Types, Useless Languages [video]

https://www.youtube.com/watch?v=xcB_LF3cdqw

1•matt_d•6m ago•0 comments

Seeking a Remote Collaborator

1•ReadVasonez•6m ago•0 comments

CEO sued my friend over an honest product review, but forgot to cover his tracks [video]

https://www.youtube.com/watch?v=OPCwZaxOBu4

1•bbrookshier•6m ago•0 comments

Stop the Apple Music app from launching

https://lowtechguys.com/musicdecoy/

2•bobbiechen•7m ago•0 comments

Do your best research with NotebookLM

https://blog.google/innovation-and-ai/products/notebooklm/better-research-notebooklm/

1•xnx•7m ago•0 comments

The Download: how the World Cup ball will fly and OpenAI's "super app"

https://www.technologyreview.com/2026/06/08/1138485/the-download-world-cup-ball-openai-super-app/

1•joozio•7m ago•0 comments

WWDC 2026 LIVE NOW [video]

https://www.youtube.com/watch?v=hF8swzNR1-o

1•SpyCoder77•8m ago•0 comments

Show HN: We're open sourcing Superlog (YC P26), an autonomous monitoring tool

https://github.com/superloglabs/superlog

1•signalbright•9m ago•0 comments

Penn Wharton Budget Model

https://budgetmodel.wharton.upenn.edu/model/

1•fzliu•12m ago•0 comments

ALE

https://github.com/arcfide/ALE

1•tosh•15m ago•0 comments

Show HN: Gitdot – a better GitHub. Open-source, anti-AI, and written in Rust

https://gitdot.io/

3•pybae•16m ago•0 comments

The Process Was the Point

https://darshanmakwana412.github.io/2026/06/the-process-was-the-point/

1•martianvoid•21m ago•0 comments

Reflecting on a Year of Claude Code

https://www.youtube.com/watch?v=Hth_tLaC2j8

1•doppp•21m ago•0 comments

Using AI Centaur Systems to Strengthen Professional Judgment

https://download.ssrn.com/2026/5/22/6814343.pdf?response-content-disposition=inline&X-Amz-Securit...

1•droidjj•22m ago•0 comments

A Simple System for TODOs

https://graybearding.bearblog.dev/a-simple-system-for-todos/

1•rglover•24m ago•0 comments

You Don't Need a GitHub Copilot Subscription to Use VS Code AI Features

https://medium.com/@jeffreyflynt02/you-dont-need-a-github-copilot-subscription-to-use-vs-code-ai-...

2•jflynt76•24m ago•0 comments

Hackers likely hijacked over 20k Instagram accounts with Meta's AI chatbot

https://www.theverge.com/tech/945658/meta-ai-support-chatbot-exploit-instagram-accounts

5•LordAtlas•27m ago•0 comments

The dangerous unknowns at the heart of LLMs

https://yalereview.org/article/melanie-mitchell-jagged-intelligence

2•jadelcastillo•28m ago•0 comments

Beyond Ralph Loops: Orchestrate-Map-Reduce and Higher Order Skills

https://twitter.com/djgrant_/status/2063960111173808335

1•djgrant•28m ago•0 comments

Battery-free textile turns clothing into a real-time blood pressure monitor

https://techxplore.com/news/2026-04-battery-free-textile-real-blood.html

1•PaulHoule•28m ago•0 comments

Kimi Work: The AI Desktop for Knowledge Work

https://www.kimi.com/products/kimi-work

1•pretext•30m ago•0 comments