frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

The Unreasonable Effectiveness of Reasonless Intermediate Tokens

https://arxiv.org/abs/2505.13775
4•YeGoblynQueenne•1y ago

Comments

tocs3•1y ago
I asked ChatGPT to restate this in more laymen's terms (posted below) and I am not to surprised at the answer.

"Lately, some AI models have shown impressive abilities to solve complex problems, and many people credit this to a method called Chain of Thought (CoT), where the model is trained to think through steps like a human might. In this paper, we take a closer look at that idea to see if it's really what's driving better performance.

We focus on the model’s step-by-step thinking (the words it generates along the way) — often treated like human "thoughts" — and examine whether these actually help the model solve problems more accurately. To test this, we train AI models using clean, correct step-by-step reasoning paths and final answers, all based on a known solving method (A* search). This lets us check both the final answers and the reasoning steps to see how they relate.

Interestingly, we find that even when a model gives the right answer, its reasoning steps can still be wrong or messy. To go further, we even train models using completely random and incorrect reasoning steps — and surprisingly, they still perform about the same, and sometimes even better, than those trained on correct steps.

This suggests that the step-by-step "thoughts" the model shows aren’t as meaningful or reliable as many assume. In short, just because a model looks like it’s reasoning through a problem doesn’t mean it actually is — and we should be careful not to treat its outputs as if it thinks like a human or follows strict logic."

Show HN: Pixtrava – Public Profiles for Our Persistent 3D Voxel World

https://pixtrava.com/blog/public-profiles-launch
1•sbcom•54s ago•0 comments

Common 3D Benchy Problems, Causes and Fixes

https://bambu3design.com/13-common-3d-benchy-problems-causes-fixes/
1•ehsanamel•1m ago•1 comments

Japan's Hayabusa2 probe to conduct flyby of Torifune asteroid

https://www3.nhk.or.jp/nhkworld/en/news/20260705_01/
1•dvh•2m ago•0 comments

Show HN: Selbstbild – What Fable 5 thinks of your HN comment history

https://selbstbild.eu
2•Topfi•14m ago•0 comments

Skill > ~2.5-3X PNG > OCR > paint in QR

https://github.com/YogiSotho/dense-image-gen
2•yogisotho•21m ago•0 comments

New Weekly Space and Hypersonics Engineering Newsletter

https://buttondown.com/MaxQFrontier/archive/from-orbital-data-farms-to-hypersonic-prints-this/
2•chadsutter•23m ago•1 comments

Teaching Claude to Write Like Zweig

https://rornic.dev/posts/teaching-claude-to-write-like-zweig/
3•rornic•24m ago•0 comments

Sedentarism Exhibits a Distinct Mitochondrial Bioenergetic Phenotype

https://www.mdpi.com/3042-5158/2/3/10
2•bookofjoe•24m ago•0 comments

Self Hosted NAS Platform

https://github.com/benjaminjaklic/NAS_readme
2•user1256•26m ago•0 comments

Diffflow.com: Visual-Selector

https://diffflow.com/blog/visual-selector/
2•7rin0•26m ago•0 comments

If You Can Write Acceptance Criteria, You Can Write an AI Routing Policy

https://age-of-product.com/ai-routing-policy/
2•swolpers•28m ago•0 comments

Show HN: ChartsPrintables – printable conversion charts and reference tables

https://chartsprintables.com
2•robot1996•32m ago•0 comments

Harvey AI started with a Reddit thread. Now it's worth $11B

https://freemalta.com/hub/library/they-named-a-11-billion-company-after-harvey-specter-then-they-...
2•ilhaniremyuce•36m ago•0 comments

The New 'Bootstrap' to the American Dream: Deep Cleaning Cars

https://www.wsj.com/business/entrepreneurship/why-some-workers-are-trading-desks-for-car-detailin...
2•JumpCrisscross•39m ago•0 comments

Europe's new climate in seven charts

https://www.bbc.com/news/articles/c8e2j0j87reo
3•saikatsg•39m ago•0 comments

Scientist who cleaned space toilet on work now leading Mars exploration

https://www.bbc.com/news/articles/cz758x04g83o
4•saikatsg•40m ago•0 comments

U.S. Policies Unintentionally Accelerated China's Open AI Ecosystems

https://arxiv.org/abs/2606.15999
3•hunglee2•49m ago•0 comments

I wanted to be Anthony Bourdain–until I met him

https://cailey.substack.com/p/i-wanted-to-be-anthony-bourdainuntil
6•FinnLobsien•49m ago•2 comments

Comparing Investment Options with Charts – The Basics

https://finbodhi.com/docs/blog/compare-charts/
3•ciju•49m ago•0 comments

What Is Linux Swap, and Should You Enable It?

https://blog.lyc8503.net/post/21-swap-setup/
2•uneven9434•53m ago•0 comments

Update: ZerfAI generated 60 Micro SaaS ideas – 14 scored 8/10 or higher

https://www.zerfai.com/
1•awsaqh•1h ago•0 comments

Ask HN: Good fast IDE for reading and navigating code in multiple languages

1•akkad33•1h ago•1 comments

Show HN: Self-healing review gate and knowledge base for Claude Code (Beta)

https://verity.md
2•claudiacsf•1h ago•0 comments

"Gauging your humanity This may take some seconds"

https://github.com/unt-libraries/django-altcha-middleware
1•joebig•1h ago•0 comments

"Stop Moralizing AirCon": How Institutions Shift Responsibility onto Individuals

https://jdjayne.substack.com/p/moralizing
2•theanonymousone•1h ago•0 comments

Google: Earthquake early warning system did not work properly in Turkey

https://www.heise.de/en/news/Google-Earthquake-early-warning-system-did-not-work-properly-in-Turk...
2•teleforce•1h ago•0 comments

9 Months of Agentic Development

https://www.benko.app/blog/9-months-of-agentic-development
1•dools•1h ago•0 comments

The 18-Months Deadline

https://adlrocha.substack.com/p/adlrocha-the-18-months-deadline
3•adlrocha•1h ago•2 comments

antiware-JS

https://www.npmjs.com/package/antiware-js?activeTab=readme
1•mohamed-medhat•1h ago•1 comments

Lisper.io

https://lisper.io.
1•rifflax•1h ago•1 comments