frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Comment on the Illusion of Thinking

https://arxiv.org/abs/2506.09250
4•esafak•14h ago

Comments

esafak•14h ago
Shojaee et al. (2025) report that Large Reasoning Models (LRMs) exhibit "accuracy collapse" on planning puzzles beyond certain complexity thresholds. We demonstrate that their findings primarily reflect experimental design limitations rather than fundamental reasoning failures. Our analysis reveals three critical issues: (1) Tower of Hanoi experiments systematically exceed model output token limits at reported failure points, with models explicitly acknowledging these constraints in their outputs; (2) The authors' automated evaluation framework fails to distinguish between reasoning failures and practical constraints, leading to misclassification of model capabilities; (3) Most concerningly, their River Crossing benchmarks include mathematically impossible instances for N > 5 due to insufficient boat capacity, yet models are scored as failures for not solving these unsolvable problems. When we control for these experimental artifacts, by requesting generating functions instead of exhaustive move lists, preliminary experiments across multiple models indicate high accuracy on Tower of Hanoi instances previously reported as complete failures. These findings highlight the importance of careful experimental design when evaluating AI reasoning capabilities.

The State of Data and AI Engineering 2025

https://lakefs.io/blog/the-state-of-data-ai-engineering-2025/
1•ajhenaor•49s ago•0 comments

Show HN: Mechanical linear acceleration with racks and gears

https://imgur.com/a/KEOrefi
1•grondilu•1m ago•0 comments

Would AI have prevented the Air India flight 171 crash?

1•3apo•4m ago•0 comments

Big Stay paying off: Quitting to job-switch is worse for wage growth

https://www.businessinsider.com/big-stay-quitting-job-wage-growth-white-collar-2025-6
1•pseudolus•4m ago•1 comments

Baby Boomers' Luck Is Running Out

https://www.theatlantic.com/health/archive/2025/06/baby-boomers-aging-trump/683150/
2•arkj•5m ago•0 comments

Tobacco excise has passed a 'tipping point' and is fuelling black market

https://www.theguardian.com/society/2025/jun/11/tobacco-excise-isnt-making-australians-smoke-less-and-should-be-frozen-to-curb-black-market-economists-say
1•paulpauper•6m ago•0 comments

The Return of the American Model

https://marginalrevolution.com/marginalrevolution/2025/06/the-return-of-the-american-model.html
1•paulpauper•6m ago•0 comments

ESA studying impacts of proposed NASA budget cuts – SpaceNews

https://spacenews.com/esa-studying-impacts-of-proposed-nasa-budget-cuts/
1•rbanffy•7m ago•0 comments

Marimo: Notebook That Compile Python for Reproducibility and Reusability [video]

https://www.youtube.com/watch?v=3-3zy5W2SOw
2•teleforce•11m ago•0 comments

Hide and unhide music, movies, TV shows, audiobooks, and books

https://support.apple.com/en-us/118400
2•walterbell•12m ago•0 comments

AMD Vulcano 800G NIC Coming as AMD Outlines Its UALink and UEC Scale Plans

https://www.servethehome.com/amd-vulcano-800g-nic-coming-as-amd-outlines-its-ualink-and-uec-scale-plans/
1•rbanffy•15m ago•0 comments

Linux Kernel API Specification

https://lore.kernel.org/lkml/20250614134858.790460-1-sashal@kernel.org/
2•amaccuish•17m ago•0 comments

Tropics of Cancer and Capricorn Interactive Map

https://tropics-map.netlify.app/
3•vishnuharidas•25m ago•1 comments

Earthquake simulator to test 10-story steel-framed building

https://techxplore.com/news/2025-05-earthquake-simulator-story-steel.html
2•PaulHoule•28m ago•0 comments

New EV charging feature could make apps and cards obsolete

https://electrek.co/2025/06/13/new-ev-charging-feature-could-make-apps-and-cards-obsolete/
1•ciconia•29m ago•6 comments

What is the competitive advantage of authors in the age of LLMs?

https://lethain.com/competitive-advantage-author-llms/
1•norrsson•29m ago•0 comments

What is Israel’s Iron Dome? Here’s how the missile defense system works

https://www.cnbc.com/2025/06/13/israel-iron-dome.html
3•rntn•30m ago•0 comments

Recurrent fluorescence helps organic molecules survive extreme conditions

https://phys.org/news/2025-06-recurrent-fluorescence-molecules-survive-extreme.html
1•rbanffy•30m ago•0 comments

SEO Expert (8 Yrs) – Going Through a Rough Time, Looking for Work

2•deepmistry•30m ago•0 comments

How the Final Cartridge III Freezer Works

https://www.pagetable.com/?p=1810
2•ingve•32m ago•0 comments

Apple fixes zero-click exploit underpinning Paragon spyware attacks

https://www.theregister.com/2025/06/13/apple_fixes_zeroclick_exploit_underpinning/
2•Bender•34m ago•0 comments

Do you trust Xi with your 'private' browsing data? [title shortened]

https://www.theregister.com/2025/06/13/apple_google_chinabased_vpns/
2•Bender•35m ago•1 comments

Show HN: FlagShark – Automatically removes stale feature flags via PRs

https://flagshark.com/
1•joebi•35m ago•0 comments

Dangers of Competent Governance

https://www.overcomingbias.com/p/dangers-of-competent-governance
2•paulpauper•35m ago•0 comments

Ask HN: Does working with JavaScript affect your mental health?

1•jerawaj740•36m ago•1 comments

Children need the freedom to play on streets again

https://theconversation.com/children-need-the-freedom-to-play-on-driveways-and-streets-again-heres-how-to-make-it-happen-254543
2•dotcoma•37m ago•1 comments

Think twice before abandoning Xorg. Wayland breaks everything

https://gist.github.com/probonopd/9feb7c20257af5dd915e3a9f2d1f2277
2•gala8y•37m ago•1 comments

SQLite's Architectural Evolution and Performance Optimisation

https://lord.technology/2025/06/14/sqlites-architectural-evolution-and-performance-optimisation.html
2•emschwartz•38m ago•0 comments

A website anyone can update by calling in and describing their changes to an LLM

https://7159997483.com/
2•joek1301•39m ago•1 comments

A Letter to Europe You're stronger than you think. Act like it

https://paulkrugman.substack.com/p/a-letter-to-europe
3•dotcoma•41m ago•0 comments