frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

The Unreasonable Effectiveness of Reasonless Intermediate Tokens

https://arxiv.org/abs/2505.13775
4•YeGoblynQueenne•6mo ago

Comments

tocs3•6mo ago
I asked ChatGPT to restate this in more laymen's terms (posted below) and I am not to surprised at the answer.

"Lately, some AI models have shown impressive abilities to solve complex problems, and many people credit this to a method called Chain of Thought (CoT), where the model is trained to think through steps like a human might. In this paper, we take a closer look at that idea to see if it's really what's driving better performance.

We focus on the model’s step-by-step thinking (the words it generates along the way) — often treated like human "thoughts" — and examine whether these actually help the model solve problems more accurately. To test this, we train AI models using clean, correct step-by-step reasoning paths and final answers, all based on a known solving method (A* search). This lets us check both the final answers and the reasoning steps to see how they relate.

Interestingly, we find that even when a model gives the right answer, its reasoning steps can still be wrong or messy. To go further, we even train models using completely random and incorrect reasoning steps — and surprisingly, they still perform about the same, and sometimes even better, than those trained on correct steps.

This suggests that the step-by-step "thoughts" the model shows aren’t as meaningful or reliable as many assume. In short, just because a model looks like it’s reasoning through a problem doesn’t mean it actually is — and we should be careful not to treat its outputs as if it thinks like a human or follows strict logic."

A Fentanyl Vaccine Is About to Get Its First Major Test

https://www.wired.com/story/a-fentanyl-vaccine-is-about-to-get-its-first-major-test/
1•quapster•1m ago•0 comments

Investing in the Python Ecosystem

https://vercel.com/blog/investing-in-the-python-ecosystem
1•lumpa•3m ago•0 comments

The Complete Xbox Ally and Ally X Hands-On Review (Windows vs. Bazzite)

https://boilingsteam.com/complete-xbox-ally-and-ally-x-hands-on-review/
1•ekianjo•3m ago•0 comments

The Algorithm That Exposed the AI Industry's Circular Financing Scheme

https://substack.com/home/post/p-179453867
1•jnord•4m ago•0 comments

IBM: The 'next big thing' no longer exists (2006)

https://www.zdnet.com/article/ibm-the-next-big-thing-no-longer-exists/
2•nahikoa•6m ago•0 comments

AI is all about Software Engineering

https://sb.thoughts.ar/posts/2025/12/03/ai-is-all-about-software-engineering/
2•santiagobasulto•7m ago•0 comments

Grow Slowly, Stay Small

https://herman.bearblog.dev/grow-slowly-stay-small/
2•rpgbr•10m ago•0 comments

Swiss Data Protection Group Says US Cloud Giants Can't Meet Privacy Standards

https://itsfoss.com/news/privatim-declares-international-cloud-unsuitable/
1•speckx•10m ago•0 comments

AI companies' safety practices fail to meet global standards, study shows

https://www.reuters.com/business/ai-companies-safety-practices-fail-meet-global-standards-study-s...
1•giuliomagnifico•13m ago•0 comments

Digital Signatures Do Not Guarantee Exclusive Ownership (2005) [pdf]

https://www.bolet.org/~pornin/2005-acns-pornin+stern.pdf
1•basilikum•16m ago•0 comments

Rocks Are Alive

https://nautil.us/rocks-are-alive-1248437/
1•dnetesn•16m ago•0 comments

Improve Query Performance Using Python Django QuerySets

https://blog.appsignal.com/2025/12/03/improve-query-performance-using-django-python-querysets.html
1•unripe_syntax•18m ago•0 comments

Show HN: AIThreads – Give your AI agent an email address in 30 seconds

1•heyarviind2•20m ago•0 comments

Are we repeating the telecoms crash with AI datacenters?

https://martinalderson.com/posts/are-we-really-repeating-the-telecoms-crash-with-ai-datacenters/
3•davedx•22m ago•0 comments

AWS CodeCommit Returns to GA

https://aws.amazon.com/blogs/devops/aws-codecommit-returns-to-general-availability/
1•afrisch•25m ago•0 comments

Anthropic Cannot IPO, No Frontier Model Startup Can

1•zerosizedweasle•26m ago•0 comments

Magic Square

https://en.wikipedia.org/wiki/Magic_square
1•ZeljkoS•26m ago•0 comments

The Nuances of JavaScript Typing Using JSDoc

https://thathtml.blog/2025/12/nuances-of-typing-with-jsdoc/
2•speckx•28m ago•0 comments

The Performance Inequality Gap, 2026

https://infrequently.org/2025/11/performance-inequality-gap-2026/
1•ColinWright•29m ago•0 comments

Fabergé egg fetches record $30.2 million at rare auction

https://www.cnn.com/2025/12/02/style/faberge-winter-egg-auction-sold
2•sipofwater•30m ago•1 comments

India scraps order to pre-install state-run cyber safety app on smartphones

https://www.bbc.com/news/articles/clydg2re4d1o
3•wolpoli•31m ago•1 comments

The 50MB Markdown Files That Broke Our Server

https://glama.ai/blog/2025-12-03-the-50mb-markdown-files-that-broke-our-server
1•punkpeye•31m ago•0 comments

What I learned building an opinionated and minimal coding agent

https://mariozechner.at/posts/2025-11-30-pi-coding-agent/
1•kalendos•31m ago•0 comments

Kubernetes: HPA, VPA, and Cluster Autoscaler

https://oneuptime.com/blog/post/2025-12-02-tune-kubernetes-autoscaling-for-bursty-workloads/view
1•ndhandala•32m ago•0 comments

Saffron Walden 2007 – 2013

https://leejo.github.io/2025/11/05/walden/
1•leejo•35m ago•0 comments

Ask HN: What is better to use lead-free/leaded solder?

2•DenisDolya•35m ago•0 comments

Show HN: Plimsoll Line, an iOS to-do app that prioritizes mood over productivity

https://plimsoll-line.app/
1•tunaoftheland•36m ago•0 comments

Show HN: Doubao Seedream 4.5 – next‑gen image creation and editing model

https://www.seedream4-5.net
1•Viaya•40m ago•0 comments

Go proposal: Type-safe error checking

https://antonz.org/accepted/errors-astype/
1•birdculture•41m ago•0 comments

It's Not Always ICache

https://matklad.github.io/2021/07/10/its-not-always-icache.html
1•susam•47m ago•0 comments