frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

A 95%-accurate AI agent fails 64% of the time on 20-step tasks

https://kenoticlabs.com/insights/ai-agent-failure
3•SamuelTanguturi•1h ago

Comments

wizeyone•1h ago
2 things. The headline math 0.95^20 = 0.358 assumes independent errors. "The body argues the opposite - every subsequent action operates on flawed foundations."

Real long chain failure is worse than the math predicts, not equal to it. The headline undersells the problem the article actually describes.

Also DTCM's eval is narrative-QA across 250 stories, reading comprehension over accumulated context, not an agent tool use.

The production failure modes it discusses (wrong tool selection, brittle API contracts, etc) don't obviously map to that benchmark. The 96% number is encouraging but not directly translatable

If America's So Rich, How'd It Get So Sad?

https://www.derekthompson.org/p/if-americas-so-rich-howd-it-get-so
1•momentmaker•9s ago•0 comments

Why prediction markets are a sure sign that our civilisation is in decay

https://www.joanwestenberg.com/why-prediction-markets-are-a-sure-sign-that-our-civilisation-is-in...
1•alcazar•2m ago•0 comments

Zork-bench: An LLM reasoning eval based on text adventure games

https://www.lowimpactfruit.com/p/zork-bench-an-llm-reasoning-eval
1•mnky9800n•2m ago•0 comments

Unkey raised $4.5M to ship APIs, not infrastructure

https://www.unkey.com/blog/unkey-raises-seed
2•jamesperkins•4m ago•0 comments

France confirms data breach at government agency that manages citizens' IDs

https://techcrunch.com/2026/04/22/france-confirms-data-breach-at-government-agency-that-manages-c...
2•robtherobber•5m ago•0 comments

#008: Design Is a Generous Gift

https://metedata.substack.com/p/008-design-is-a-generous-gift
1•young_mete•6m ago•0 comments

Context Engineering and the Limits of Agentic Coding

https://stephenfritz.dev/blog/context-engineering/
1•conner_bw•6m ago•0 comments

Why Onboarding Flow Is the New Signup Form

https://uxmovement.substack.com/p/why-onboarding-flow-is-the-new-signup
1•antux•7m ago•0 comments

Johny Srouji Named Apple's Chief Hardware Officer

https://www.apple.com/newsroom/2026/04/johny-srouji-named-apples-chief-hardware-officer/
1•wslh•8m ago•0 comments

NY sues Coinbase and Gemini to halt unlicensed prediction market businesses

https://apnews.com/article/prediction-markets-coinbase-gemini-lawsuit-new-york-25fa0db90266f4ecf9...
3•1vuio0pswjnm7•8m ago•0 comments

Texas a&M's H-1B Spending Sparks Debate over Jobs and Transparency

https://dallasexpress.com/education/texas-ams-h-1b-spending-sparks-debate-over-jobs-and-transpare...
2•rawgabbit•8m ago•0 comments

Finra Adopts New Standards to Replace the Day Trading Margin Requirements

https://www.finra.org/rules-guidance/notices/26-10
1•hentrep•8m ago•0 comments

Desktop Powered by Hashing

https://starlight-ai.freemyip.com/sandbox/4c91530a5083a463798865d9f357d473d5318fe683784a390a48a2d...
2•macroadster•9m ago•0 comments

Developer Builds Script That Calls Back Spam Callers in Endless Loop [video]

https://www.youtube.com/shorts/3zyng3lqNAs
1•thunderbong•10m ago•0 comments

Show HN: Interactive knowledge graph for the AAuth (Agent Auth) protocol

https://mcp-shark.github.io/aauth-explorer/
1•0xchamin•11m ago•0 comments

I spent 6 years building my Kanban as I hated how managers run the boards

https://www.npmjs.com/package/ooko
2•okovooo•12m ago•1 comments

The unflattering secrets revealed so far in Elon Musk's latest legal feud

https://web.archive.org/web/20260423124533/https://www.washingtonpost.com/technology/2026/04/23/m...
1•1vuio0pswjnm7•14m ago•0 comments

AWS/Azure IAM Audit Automation – Lessons from the ShinyHunters Breach

https://cyberalert.com.pl/articles/iam-audit-multicloud-shinyhunters-2026-en.html
1•D__S•17m ago•0 comments

Train separately, merge together: Modular post-training with mixture-of-experts

https://allenai.org/blog/bar
1•gmays•18m ago•0 comments

Breathing in nanoparticles could enable a 10-minute pneumonia check

https://phys.org/news/2026-03-nanoparticles-enable-minute-pneumonia.html
1•PaulHoule•18m ago•0 comments

Atlassian Expands Partnership with Google Cloud to Power Agentic AI

https://www.googlecloudpresscorner.com/2026-04-22-Atlassian-Expands-Partnership-with-Google-Cloud...
2•marcosscriven•18m ago•0 comments

Microsoft plans first voluntary employee buyout in company's 51-year history

https://www.cnbc.com/2026/04/23/microsoft-plans-first-voluntary-retirement-program-for-us-employe...
2•1vuio0pswjnm7•19m ago•2 comments

How to Grep Video

https://blog.cloudglue.dev/how-to-grep-video/
5•mrmarket•19m ago•0 comments

Ask HN: How to reduce human bottle neck in solo game dev

1•pennystudio-li•19m ago•0 comments

We rebuilt our Electron recording engine in Swift

https://circleback.ai/blog/how-we-rebuilt-our-electron-recording-engine-in-swift
1•arguiot•20m ago•0 comments

AI Model and 'MAGA' Influencer Emily Hart Unmasked as Indian Man

https://www.mandatory.com/news/1761666-maga-influencer-ai-model-emily-hart-unmasked-indian-man
3•CharlesW•21m ago•0 comments

What you can do in a decade

https://www.swyx.io/decade
1•AnhTho_FR•22m ago•0 comments

Google's opt-out cookies still ignored, 15 years later

https://jackyan.com/blog/2026/04/googles-opt-out-cookies-still-ignored-15-years-later/
4•speckx•22m ago•0 comments

LLM users mistake AI output for their own real skill

https://arxiv.org/abs/2604.14807
1•linkregister•23m ago•0 comments

Rgfeawvgewwvga

https://selfba.se/t5g4egtvfergvretgvr
1•aegvegv•23m ago•0 comments