frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

The Unreasonable Effectiveness of Reasonless Intermediate Tokens

https://arxiv.org/abs/2505.13775
4•YeGoblynQueenne•7mo ago

Comments

tocs3•7mo ago
I asked ChatGPT to restate this in more laymen's terms (posted below) and I am not to surprised at the answer.

"Lately, some AI models have shown impressive abilities to solve complex problems, and many people credit this to a method called Chain of Thought (CoT), where the model is trained to think through steps like a human might. In this paper, we take a closer look at that idea to see if it's really what's driving better performance.

We focus on the model’s step-by-step thinking (the words it generates along the way) — often treated like human "thoughts" — and examine whether these actually help the model solve problems more accurately. To test this, we train AI models using clean, correct step-by-step reasoning paths and final answers, all based on a known solving method (A* search). This lets us check both the final answers and the reasoning steps to see how they relate.

Interestingly, we find that even when a model gives the right answer, its reasoning steps can still be wrong or messy. To go further, we even train models using completely random and incorrect reasoning steps — and surprisingly, they still perform about the same, and sometimes even better, than those trained on correct steps.

This suggests that the step-by-step "thoughts" the model shows aren’t as meaningful or reliable as many assume. In short, just because a model looks like it’s reasoning through a problem doesn’t mean it actually is — and we should be careful not to treat its outputs as if it thinks like a human or follows strict logic."

How Did TVs Get So Cheap?

https://www.construction-physics.com/p/how-did-tvs-get-so-cheap
1•thelastgallon•18s ago•0 comments

NASA considers evacuating ailing crew member from International Space Station

https://arstechnica.com/space/2026/01/nasa-postpones-space-station-spacewalk-due-to-crew-members-...
1•falcor84•2m ago•0 comments

Will memory fail the AI boom?

https://www.sdxcentral.com/analysis/will-memory-fail-the-ai-boom/
1•ironyman•3m ago•0 comments

Linus T: "The AI Slop Issue Is *Not* Going to Be Solved with Documentation"

https://www.phoronix.com/news/Torvalds-Linux-Kernel-AI-Slop
1•signa11•3m ago•0 comments

Life Happens at 1x Speed

https://terriblesoftware.org/2026/01/08/life-happens-at-1x-speed/
2•matheusml•7m ago•0 comments

Interview: David Haz, Creator of React Bits

https://motion.dev/magazine/interview-david-haz-creator-of-react-bits
1•SirHound•7m ago•0 comments

The Jeff Dean Facts

https://github.com/LRitzdorf/TheJeffDeanFacts
1•ravenical•9m ago•0 comments

Staging is a wasteful lie: the case for the mono-environment

https://www.tomwphillips.co.uk/2026/01/staging-is-a-wasteful-lie-the-case-for-the-mono-environment/
3•tomwphillips•10m ago•0 comments

How do language models solve Bayesian network inference?

https://ferjorosa.github.io/blog/2026/01/02/llms-probailistic-reasoning.html
1•sebg•10m ago•0 comments

Ghent University rector Petra De Sutter uses AI-fabricated quotes in speech

https://www.vrt.be/vrtnws/en/2026/01/08/ghent-university-rector-petra-de-sutter-uses-fabricated-q...
2•lode•13m ago•2 comments

Nano Banana Pro

https://nano-bananapro.org/
2•letsmkvideo•14m ago•1 comments

Show HN: Do you think this is the best diffchecker and would you switch?

https://diffchecker.dev/
1•subhash_k•15m ago•2 comments

Show HN: Spark – Zero-config IoT deployment tool written in Rust

https://github.com/Velooroo/Spark
1•Kazilsky•18m ago•1 comments

How Much Does Education Improve Intelligence? A Meta-Analysis [pdf]

https://labs.la.utexas.edu/tucker-drob/files/2019/08/Ritchie-Tucker-Drob-2018-Psych-Science-How-M...
1•sebg•19m ago•1 comments

ByteDance local agent is something I might feel safe running

https://github.com/bytedance/UI-TARS-desktop
1•mark_l_watson•19m ago•1 comments

How cybercriminals plot to rob a target within a week

https://www.reuters.com/graphics/SOUTHEASTASIA-SCAMS/MANUALS/klpyjlqelvg/
1•barishnamazov•20m ago•0 comments

Architecture Governance: Capturing What and How

https://tomasjurasek.substack.com/p/architecture-governance-capturing
1•silent715•21m ago•0 comments

Verification-Driven Development (VDD) via Iterative Adversarial Refinement

https://gist.github.com/dollspace-gay/45c95ebfb5a3a3bae84d8bebd662cc25
1•sebg•23m ago•0 comments

Shared State Context for AI Agents [Ask/Show][Looking for Beta]

1•aperi•24m ago•0 comments

The Zcash core dev team has resigned

https://twitter.com/tedpillows/status/2009206637962383809
3•simonebrunozzi•26m ago•2 comments

Testmon – Speed up your test suite in CI

https://testmon.net
1•drcongo•27m ago•0 comments

Execline: A Small Scripting Language

http://skarnet.org/software/execline/
1•fanf2•29m ago•0 comments

I Drilled Holes in My $200 Waterproof Panniers

https://cycletouring.substack.com/p/i-drilled-holes-in-my-200-waterproof
2•djrivard•30m ago•0 comments

Wigner Cat Phases: Transition to Quantum Chaos

https://arxiv.org/abs/2512.22169
1•northlondoner•30m ago•1 comments

Show HN: Analytics for SaaS Founders Connecting Stripe, Google Analytics and GSC

https://busel.ai/
1•stasman•30m ago•0 comments

Is Claude Ret***ed? Website where you vote on Claude's daily stupidity

https://www.isclauderetarded.today/
1•skrabe•31m ago•3 comments

Why Deepfake Technology Forces Courts to Rethink the Reliability of Evidence

https://www.technologylaw.ai/p/deepfake-technology-evidentiary-reliability-courts
1•pcaharrier•32m ago•0 comments

Beyond Training: Enabling Self-Evolution of Agents with Mobimem

https://arxiv.org/abs/2512.15784
1•PaulHoule•35m ago•0 comments

Trend Hacking 2025: The Niche Protocol for Founders

https://blog.vect.pro/trend-hacking-guide
1•WoWSaaS•36m ago•1 comments

One Regulation E, Two Different Regimes

https://www.bitsaboutmoney.com/archive/regulation-e/
1•gmcharlt•36m ago•0 comments