frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Comment on the Illusion of Thinking

https://arxiv.org/abs/2506.09250
4•esafak•7mo ago

Comments

esafak•7mo ago
Shojaee et al. (2025) report that Large Reasoning Models (LRMs) exhibit "accuracy collapse" on planning puzzles beyond certain complexity thresholds. We demonstrate that their findings primarily reflect experimental design limitations rather than fundamental reasoning failures. Our analysis reveals three critical issues: (1) Tower of Hanoi experiments systematically exceed model output token limits at reported failure points, with models explicitly acknowledging these constraints in their outputs; (2) The authors' automated evaluation framework fails to distinguish between reasoning failures and practical constraints, leading to misclassification of model capabilities; (3) Most concerningly, their River Crossing benchmarks include mathematically impossible instances for N > 5 due to insufficient boat capacity, yet models are scored as failures for not solving these unsolvable problems. When we control for these experimental artifacts, by requesting generating functions instead of exhaustive move lists, preliminary experiments across multiple models indicate high accuracy on Tower of Hanoi instances previously reported as complete failures. These findings highlight the importance of careful experimental design when evaluating AI reasoning capabilities.

EVs Are a Failed Experiment

https://spectator.org/evs-are-a-failed-experiment/
1•ArtemZ•5m ago•1 comments

MemAlign: Building Better LLM Judges from Human Feedback with Scalable Memory

https://www.databricks.com/blog/memalign-building-better-llm-judges-human-feedback-scalable-memory
1•superchink•6m ago•0 comments

CCC (Claude's C Compiler) on Compiler Explorer

https://godbolt.org/z/asjc13sa6
1•LiamPowell•7m ago•0 comments

Homeland Security Spying on Reddit Users

https://www.kenklippenstein.com/p/homeland-security-spies-on-reddit
2•duxup•10m ago•0 comments

Actors with Tokio (2021)

https://ryhl.io/blog/actors-with-tokio/
1•vinhnx•12m ago•0 comments

Can graph neural networks for biology realistically run on edge devices?

https://doi.org/10.21203/rs.3.rs-8645211/v1
1•swapinvidya•24m ago•1 comments

Deeper into the shareing of one air conditioner for 2 rooms

1•ozzysnaps•26m ago•0 comments

Weatherman introduces fruit-based authentication system to combat deep fakes

https://www.youtube.com/watch?v=5HVbZwJ9gPE
2•savrajsingh•26m ago•0 comments

Why Embedded Models Must Hallucinate: A Boundary Theory (RCC)

http://www.effacermonexistence.com/rcc-hn-1-1
1•formerOpenAI•28m ago•2 comments

A Curated List of ML System Design Case Studies

https://github.com/Engineer1999/A-Curated-List-of-ML-System-Design-Case-Studies
3•tejonutella•32m ago•0 comments

Pony Alpha: New free 200K context model for coding, reasoning and roleplay

https://ponyalpha.pro
1•qzcanoe•36m ago•1 comments

Show HN: Tunbot – Discord bot for temporary Cloudflare tunnels behind CGNAT

https://github.com/Goofygiraffe06/tunbot
1•g1raffe•39m ago•0 comments

Open Problems in Mechanistic Interpretability

https://arxiv.org/abs/2501.16496
2•vinhnx•45m ago•0 comments

Bye Bye Humanity: The Potential AMOC Collapse

https://thatjoescott.com/2026/02/03/bye-bye-humanity-the-potential-amoc-collapse/
2•rolph•49m ago•0 comments

Dexter: Claude-Code-Style Agent for Financial Statements and Valuation

https://github.com/virattt/dexter
1•Lwrless•51m ago•0 comments

Digital Iris [video]

https://www.youtube.com/watch?v=Kg_2MAgS_pE
1•vermilingua•56m ago•0 comments

Essential CDN: The CDN that lets you do more than JavaScript

https://essentialcdn.fluidity.workers.dev/
1•telui•57m ago•1 comments

They Hijacked Our Tech [video]

https://www.youtube.com/watch?v=-nJM5HvnT5k
1•cedel2k1•1h ago•0 comments

Vouch

https://twitter.com/mitchellh/status/2020252149117313349
34•chwtutha•1h ago•5 comments

HRL Labs in Malibu laying off 1/3 of their workforce

https://www.dailynews.com/2026/02/06/hrl-labs-cuts-376-jobs-in-malibu-after-losing-government-work/
4•osnium123•1h ago•1 comments

Show HN: High-performance bidirectional list for React, React Native, and Vue

https://suhaotian.github.io/broad-infinite-list/
2•jeremy_su•1h ago•0 comments

Show HN: I built a Mac screen recorder Recap.Studio

https://recap.studio/
1•fx31xo•1h ago•1 comments

Ask HN: Codex 5.3 broke toolcalls? Opus 4.6 ignores instructions?

1•kachapopopow•1h ago•0 comments

Vectors and HNSW for Dummies

https://anvitra.ai/blog/vectors-and-hnsw/
1•melvinodsa•1h ago•0 comments

Sanskrit AI beats CleanRL SOTA by 125%

https://huggingface.co/ParamTatva/sanskrit-ppo-hopper-v5/blob/main/docs/blog.md
1•prabhatkr•1h ago•1 comments

'Washington Post' CEO resigns after going AWOL during job cuts

https://www.npr.org/2026/02/07/nx-s1-5705413/washington-post-ceo-resigns-will-lewis
4•thread_id•1h ago•1 comments

Claude Opus 4.6 Fast Mode: 2.5× faster, ~6× more expensive

https://twitter.com/claudeai/status/2020207322124132504
1•geeknews•1h ago•0 comments

TSMC to produce 3-nanometer chips in Japan

https://www3.nhk.or.jp/nhkworld/en/news/20260205_B4/
3•cwwc•1h ago•0 comments

Quantization-Aware Distillation

http://ternarysearch.blogspot.com/2026/02/quantization-aware-distillation.html
2•paladin314159•1h ago•0 comments

List of Musical Genres

https://en.wikipedia.org/wiki/List_of_music_genres_and_styles
1•omosubi•1h ago•0 comments