frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Agent Reading Test

https://agentreadingtest.com
28•kaycebasques•2h ago
https://dacharycarey.com/2026/04/06/designing-agent-reading-...

Comments

kaycebasques•2h ago
See also https://dacharycarey.com/2026/04/06/designing-agent-reading-...
dang•1h ago
Thanks! We'll put this in the toptext as well.
dostick•1h ago
The tests should have negative weights based on how often that issue encountered and impact. The 2. SPI should have like 8 negative points out of 10 as most common blocker. And whole test inverse score.
massimoto•50m ago
Would love to see some results for different providers. The tests looks super logically thought out, but could use a TL;DR (too lazy; didn't run) output.

Claude Web Opus 4.6 Extended: 14 / 20 points

x:CANARY-SPA-JSONLY-prism x:CANARY-CONNEG-MD-sigma

theyCallMeSwift•38m ago
I love this idea, but have a hypothesis that 90% of agents that people actually use today would fail this test inadvertently (false negative).

Industry best practice + standard implementation for most agents right now is to do web browsing / fetching via subagents. Their output is summarized using a cheaper model and then passed back to the parent. It's very unlikely that without preserving the actual content the subagents see that the `CANARY-` strings would be found in the output.

Any thoughts on how you'd change the test structure with this in mind?

dacharyc•4m ago
Hey there - I'm the test author, and you've hit on one of the main points. For the summarization/relevance-based content return, this is a consideration for some of the agent platforms (although I've found others actually do better here than I expected!) - which is part of the point I'm trying to drive home to folks who aren't as familiar with these systems.

I chose to structure it this way intentionally because this is the finding. Most people are surprised that agents aren't 'seeing' everything that's there, and get frustrated when an agent says something isn't there when it clearly is. Raising awareness of this is one of the main points of the exercise, to me.

Show HN: Ghost Pepper – 100% local hold-to-talk speech-to-text for macOS

https://github.com/matthartman/ghost-pepper
72•MattHart88•1h ago•32 comments

Launch HN: Freestyle – Sandboxes for Coding Agents

https://www.freestyle.sh/
144•benswerd•4h ago•79 comments

A cryptography engineer's perspective on quantum computing timelines

https://words.filippo.io/crqc-timeline/
217•thadt•5h ago•96 comments

Show HN: GovAuctions lets you browse government auctions at once

https://www.govauctions.app/
126•player_piano•5h ago•48 comments

Root Persistence via macOS Recovery Mode Safari

https://yaseenghanem.com/recovery-unrestricted-write-access/
10•yaseeng•44m ago•4 comments

German police name alleged leaders of GandCrab and REvil ransomware groups

https://krebsonsecurity.com/2026/04/germany-doxes-unkn-head-of-ru-ransomware-gangs-revil-gandcrab/
229•Bender•7h ago•119 comments

HackerRank (YC S11) Is Hiring

1•rvivek•26m ago

Battle for Wesnoth: open-source, turn-based strategy game

https://www.wesnoth.org
320•akyuu•3h ago•78 comments

What being ripped off taught me

https://belief.horse/notes/what-being-ripped-off-taught-me/
270•doctorhandshake•8h ago•157 comments

Book review: There Is No Antimemetics Division

https://www.stephendiehl.com/posts/no_antimimetics/
170•ibobev•7h ago•113 comments

Issue: Claude Code is unusable for complex engineering tasks with Feb updates

https://github.com/anthropics/claude-code/issues/42796
586•StanAngeloff•7h ago•381 comments

Sky – an Elm-inspired language that compiles to Go

https://github.com/anzellai/sky
104•whalesalad•6h ago•31 comments

Agent Reading Test

https://agentreadingtest.com
28•kaycebasques•2h ago•7 comments

A macOS bug that causes TCP networking to stop working after 49.7 days

https://photon.codes/blog/we-found-a-ticking-time-bomb-in-macos-tcp-networking
78•RyanZhuuuu•1h ago•37 comments

Show HN: Docking – extensible Linux dock in Python

https://docking.cc
9•edumucelli•2d ago•1 comments

Eighteen Years of Greytrapping – Is the Weirdness Finally Paying Off?

https://nxdomain.no/~peter/eighteen_years_of_greytrapping.html
37•jruohonen•2d ago•3 comments

The Last Quiet Thing

https://www.terrygodier.com/the-last-quiet-thing
108•coinfused•2d ago•72 comments

Sam Altman may control our future – can he be trusted?

https://www.newyorker.com/magazine/2026/04/13/sam-altman-may-control-our-future-can-he-be-trusted
205•adrianhon•10h ago•69 comments

Show HN: I built a tiny LLM to demystify how language models work

https://github.com/arman-bd/guppylm
824•armanified•21h ago•124 comments

Show HN: Tusk for macOS and Gnome

https://shapemachine.xyz/tusk/
5•factorialboy•2d ago•0 comments

Adobe modifies hosts file to detect whether Creative Cloud is installed

https://www.osnews.com/story/144737/adobe-secretly-modifies-your-hosts-file-for-the-stupidest-rea...
165•rglullis•3h ago•73 comments

Zooming UIs in 2026: Prezi, impress.js, and why I built something different

52•tinchox6•2h ago•22 comments

SOM: A minimal Smalltalk for teaching of and research on Virtual Machines

http://som-st.github.io/
9•tosh•2h ago•0 comments

The team behind a pro-Iran, Lego-themed viral-video campaign

https://www.newyorker.com/culture/infinite-scroll/the-team-behind-a-pro-iran-lego-themed-viral-vi...
79•tantalor•7h ago•95 comments

Intelligent people are better judges of the intelligence of others

https://www.psypost.org/intelligent-people-are-better-judges-of-the-intelligence-of-others/
68•01-_-•3h ago•65 comments

The cult of vibe coding is dogfooding run amok

https://bramcohen.com/p/the-cult-of-vibe-coding-is-insane
388•drob518•2h ago•300 comments

Wikipedia's AI agent row likely just the beginning of the bot-ocalypse

https://www.malwarebytes.com/blog/ai/2026/04/wikipedias-ai-agent-row-likely-just-the-beginning-of...
23•hackernj•1h ago•15 comments

Reducto releases Deep Extract

https://reducto.ai/blog/reducto-deep-extract-agent
38•raunakchowdhuri•5h ago•5 comments

I won't download your app. The web version is a-ok

https://www.0xsid.com/blog/wont-download-your-app
760•ssiddharth•6h ago•450 comments

France pulls last gold held in US

https://www.mining.com/france-pulls-last-gold-held-in-us-for-15b-gain/
537•teleforce•13h ago•291 comments