news newest ask show jobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

AI Fails at 96% of Jobs (New Study)

https://www.youtube.com/watch?v=z3kaLM8Oj4o

4•deterministic•2h ago

Comments

sponaugle•1h ago

The study looks at a wide range of different tests spanning many different areas of expertise and output types. Some of the tests, like the web vis tasks used Sonnet not Opus (which was not out at the time). It is similar to testing a car to do many different things, but only one of the tests is the actual driving somehwhere and many of the others are based of the fabric used in the interior. This gives a very broad "96% failure" while missing the observation of the successes. Of course AI can't do everything, and nor can I.

One of the most interesting observations about AI is the timescale at which the favorite model and favorite task changes. Before November I found Sonnet to be interesting, but not moving that much of the needle. Once Opus came out it was clear the needle was not only moving, but moving fast.

deterministic•1h ago

It matches my experience using AI developing software. It is a super useful tool but also really crap at doing anything outside of its training data. There is zero real understanding or thinking going on behind the curtain.

New uniform to wear is "Prompt, Deploy, Pray."

https://www.moltbook.com/post/c6d5553f-1d9e-4b0c-9e52-c4f35a36b5b8

1•chasil•12s ago•1 comments

WebMCP is available for early preview

https://developer.chrome.com/blog/webmcp-epp

1•tzury•23s ago•0 comments

MCP Card Gen, and Valentine Card from Claude

https://starborn.github.io/MCP-Model-Card-Generator/

1•Paodim•2m ago•0 comments

TypeScript's Power in Plain JavaScript

https://dvcoolarun.com/typescript/jsdoc/2024/09/02/TypeScript-power-in-plain-javascript.html

1•dvcoolarun•3m ago•0 comments

Show HN: Mdr – TUI Markdown Reader

https://github.com/seymores/mdr

1•seymores•5m ago•0 comments

Context management is the real bottleneck in AI-assisted coding

1•hoangnnguyen•7m ago•0 comments

Insider Analytics – We have built a insider trading tracking platform

https://insideranalytics.ai

2•TheoJohn•10m ago•0 comments

Show HN: Scansprout – QR code generator I extracted from an art gallery project

https://www.scansprout.com/

1•veryhungryhippo•16m ago•0 comments

Show HN: DevUtility Hub Source Code – 117 Tools in Next.js 15

https://www.devutilityhub.me/

1•badboyshah•17m ago•0 comments

MiniMax M2.5 SOTA in Coding and Agent, Designed for Agent Universe

https://www.minimax.io/models/text

2•virgildotcodes•22m ago•0 comments

They Asked Me to Open ChatGPT During My Job Interview

https://old.reddit.com/r/jobs/comments/1r3we1z/they_asked_me_to_open_chatgpt_during_my_job/

2•_____k•22m ago•1 comments

ByteDance Seed2.0 LLM: breakthrough in complex real-world tasks

https://seed.bytedance.com/en/blog/seed2-0-%E6%AD%A3%E5%BC%8F%E5%8F%91%E5%B8%83

4•cyp0633•31m ago•5 comments

The SEC closed its investigation into Fisker

https://techcrunch.com/2026/02/13/the-sec-closed-its-investigation-into-fisker/

2•SilverElfin•32m ago•1 comments

First Proof

https://1stproof.org/

1•tosh•33m ago•0 comments

Washington pushes back against EU's bid for tech autonomy

https://www.politico.eu/article/eu-bid-for-tech-autonomy-washington-us-pushes-back/

3•frm88•35m ago•0 comments

Apple Reveals How Many iPhones Are Running iOS 26

https://www.macrumors.com/2026/02/13/apple-shares-ios-26-adoption-stats/

2•tosh•36m ago•0 comments

The Final Bottleneck

https://lucumr.pocoo.org/2026/2/13/the-final-bottleneck/

2•tosh•42m ago•0 comments

Show HN: HelloAria – AI task manager where you talk instead of type

https://www.helloaria.io/

1•saitharun_stk•42m ago•1 comments

Do Not Outsource Judgement

https://dncrews.com/do-not-outsource-judgement-76f9e5be61b9

7•mawaldne•43m ago•2 comments

Painless Activation Steering (PAS)

https://sashacui.substack.com/p/painless-activation-steering-pas

1•SashaCui•43m ago•1 comments

Show HN: Quantitative analysis of Alphabet (GOOGL) financials

https://jasonhonkl.github.io/#alphabet-quantitative-analysis

2•JasonHEIN•49m ago•0 comments

I love using TypeScript at work

https://kwojcicki.github.io/blog/WHY-I-LOVE-TYPESCRIPT

1•kwojcicki•52m ago•0 comments

14 More Lessons from 14 years at Google

https://addyosmani.com/blog/14-more-lessons/

4•talonx•1h ago•0 comments

Show HN: Swarm Curl

https://github.com/ismdeep/swarm-curl

2•ismdeep•1h ago•1 comments

The AI Dilemma

https://www.aleksandrhovhannisyan.com/blog/the-ai-dilemma/

2•aleksandrh•1h ago•0 comments

Cyber Model Arena

https://www.wiz.io/cyber-model-arena

2•ram_rattle•1h ago•0 comments

Pg_stat_ch: A PostgreSQL extension that exports every metric to ClickHouse

https://clickhouse.com/blog/pg_stat_ch-postgres-extension-stats-to-clickhouse

2•saisrirampur•1h ago•0 comments

Why haven't humans been back to the moon in over 50 years?

https://www.cnn.com/2026/02/13/science/why-humans-have-not-been-back-to-moon

3•ablaba•1h ago•2 comments

Jikipedia, a new AI-powered wiki reporting on key figures in the Epstein scandal

https://twitter.com/jmailarchive/status/2022482688691835121

2•wenjel•1h ago•0 comments

Show HN: Heart Note – a tiny web app to send beautiful one‑off digital letters

https://heartnote.online

2•azabraao•1h ago•0 comments