frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

How Replacing Developers with AI Is Going Horribly Wrong [video]

https://www.youtube.com/watch?v=ts0nH_pSAdM
1•carlos-menezes•3m ago•0 comments

They lied to you. Building software is hard

https://blog.nordcraft.com/they-lied-to-you-building-software-is-really-hard
1•xiaohanyu•5m ago•1 comments

DeepMind: Linear representations in LMs can change dramatically

https://arxiv.org/abs/2601.20834
1•simonpure•5m ago•0 comments

David Foster Wallace's Infinite Jest at 30

https://theconversation.com/friday-essay-weirdly-old-fashioned-and-wildly-uneven-david-foster-wal...
1•bryanrasmussen•6m ago•1 comments

The $2M/month lesson every SaaS founder should steal from Gumroad

https://marketingcrafted.com/case-studies/gumroad
1•mellisacodes•7m ago•1 comments

Rabbit Project Cyberdeck

https://www.rabbit.tech/earlyaccess
1•MaximilianEmel•9m ago•1 comments

AI health care is taking off in China, led by Jack Ma's Ant Group

https://restofworld.org/2026/ai-health-care-is-taking-off-in-china-led-by-jack-mas-ant-group/
1•donohoe•10m ago•0 comments

Django scales. Stop blaming the framework (part 1 of 3)

https://medium.com/@tk512/django-scales-stop-blaming-the-framework-part-1-of-3-a2b5b0ff811f
1•sgt•12m ago•0 comments

Locust Cloud is shutting down

https://www.locust.cloud/
1•Topgamer7•12m ago•1 comments

Cuthbert – a fast library for state-space-models (Kalman/particle filters etc)

https://github.com/state-space-models/cuthbert
1•stripled•12m ago•1 comments

Forget about coding. Learn UX and Product Design

https://jdjuan.substack.com/p/how-im-coding-these-days
1•jdjuan1234•14m ago•1 comments

DDR5-4800 vs. DDR5-6000 Performance With The Ryzen 7 9850X3D In 300 Benchmarks

https://www.phoronix.com/review/amd-ryzen-7-9850x3d-ddr5
1•rbanffy•15m ago•0 comments

We created more tech debt in 6 months than in a 10-year-old system

https://superkacper4.github.io/portfolio-2023/blog/technical-debt-everyday
1•mooreds•15m ago•0 comments

Matching Your Mode to the Moment

https://peoplestorming.substack.com/p/the-conversation-spectrum
1•mooreds•16m ago•0 comments

What's the Skinny on Laws That Make Salaries Public? [audio]

https://optimisteconomy.com/episodes/what-s-the-skinny-on-laws-that-make-salaries-public
1•mooreds•16m ago•0 comments

Big Blue Poised to Peddle Lots of on Premises GenAI

https://www.nextplatform.com/2026/01/29/big-blue-poised-to-peddle-lots-of-on-premises-genai/
1•rbanffy•17m ago•0 comments

Tesla kills Models S and X to build humanoid robots instead

https://arstechnica.com/cars/2026/01/tesla-kills-models-s-and-x-to-build-humanoid-robots-instead/
1•rbanffy•17m ago•0 comments

Trump says he's decertifying Canada-made aircraft and threatens 50% tariffs

https://www.cnn.com/2026/01/29/business/trump-canada-aircraft-tariff
9•throw0101c•22m ago•2 comments

ModRetro Chromatic and Koss Porta Pro Bundle

https://www.andurilgear.com:443/
1•keepamovin•22m ago•1 comments

Memory Shortage Haunts Apple's Blowout iPhone Sales

https://www.wsj.com/tech/memory-shortage-haunts-apples-blowout-iphone-sales-0410e375
1•pretext•28m ago•0 comments

How Businesses Are Manipulating ChatGPT Results

https://www.wsj.com/tech/ai/ai-what-is-geo-aeo-5c452500
2•pretext•29m ago•0 comments

The End of the Model S Is the Start of a New Tesla

https://www.wsj.com/business/autos/the-end-of-the-model-s-is-the-start-of-a-new-tesla-379cb74e
1•pretext•29m ago•0 comments

Show HN: Cycle - Integrated business banking, accounting and payroll

https://joincycle.co
1•aswinmohanme•31m ago•1 comments

Show HN: Cicada – a scripting language that integrates with C

https://github.com/heltilda/cicada
3•briancr•31m ago•0 comments

My Website Was Down for a Week, and I Was the Last to Notice

https://igorstechnoclub.com/my-website-was-down-for-a-week-and-i-was-the-last-to-notice/
2•Igor_Wiwi•33m ago•0 comments

Code is cheap. Show me the talk

https://nadh.in/blog/code-is-cheap/
3•ghostfoxgod•33m ago•0 comments

VulnLLM-R: Specialized Reasoning LLM with Agent Scaffold for Vuln Detection

https://arxiv.org/abs/2512.07533
1•gquere•33m ago•1 comments

Coding Is When We're Least Productive

https://codemanship.wordpress.com/2026/01/30/coding-is-when-were-least-productive/
1•todsacerdoti•34m ago•0 comments

Solar panels on land used for biofuels could power all cars and trucks electric

https://ourworldindata.org/biofuel-land-solar-electric-vehicles
4•alphabetatango•35m ago•0 comments

Show HN: Convert OBJ to STL Free Online Tool (Pngtostl.xyz)

https://pngtostl.xyz/convert/obj-to-stl
1•niliu123•36m ago•0 comments
Open in hackernews

Is there a good Agent Leaderboard for other real-life things than coding?

1•tototozip•1h ago
I feel like the benchmark space is quite crowded when it comes to coding Agents. We have some remarkable projects with TerminalBench, SWE-bench, RepoBench, ect, and I actually think we are close to a gold standard here. Also I know that we have general web/computer control benchmarks like GAIA, WebArena, and OSWorld, but these feel like "General Purpose" tests.

People want AI Agents to help them with different tasks, and I find close to none interesting benchmarks outside of the web vertical. Are there any projects addressing "real world" business challenges, or is everyone just focusing on coding and general web browsing right now?