frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

The Unreasonable Effectiveness of Reasonless Intermediate Tokens

https://arxiv.org/abs/2505.13775
4•YeGoblynQueenne•8mo ago

Comments

tocs3•8mo ago
I asked ChatGPT to restate this in more laymen's terms (posted below) and I am not to surprised at the answer.

"Lately, some AI models have shown impressive abilities to solve complex problems, and many people credit this to a method called Chain of Thought (CoT), where the model is trained to think through steps like a human might. In this paper, we take a closer look at that idea to see if it's really what's driving better performance.

We focus on the model’s step-by-step thinking (the words it generates along the way) — often treated like human "thoughts" — and examine whether these actually help the model solve problems more accurately. To test this, we train AI models using clean, correct step-by-step reasoning paths and final answers, all based on a known solving method (A* search). This lets us check both the final answers and the reasoning steps to see how they relate.

Interestingly, we find that even when a model gives the right answer, its reasoning steps can still be wrong or messy. To go further, we even train models using completely random and incorrect reasoning steps — and surprisingly, they still perform about the same, and sometimes even better, than those trained on correct steps.

This suggests that the step-by-step "thoughts" the model shows aren’t as meaningful or reliable as many assume. In short, just because a model looks like it’s reasoning through a problem doesn’t mean it actually is — and we should be careful not to treat its outputs as if it thinks like a human or follows strict logic."

Show HN: A real-time poker chip tracker built as a PWA (No signup required)

https://poker.beavergrow.com
1•jp1016•32s ago•1 comments

Apple Pay Could Arrive in India by the End of This Year

https://www.macobserver.com/news/apple-pay-could-arrive-in-india-by-the-end-of-this-year/
1•saikatsg•2m ago•0 comments

Farmer to Banker

https://twitter.com/Alfred_Lin/status/2011803782955540990
1•tosh•3m ago•0 comments

LastPass Users Targeted with Backup-Themed Phishing Emails

https://www.securityweek.com/lastpass-users-targeted-with-backup-themed-phishing-emails/
2•Bender•4m ago•0 comments

Claude session limits getting small

2•pragmaticalien8•9m ago•0 comments

HAM Radio Operators in Belarus Arrested, Face the Death Penalty

https://www.404media.co/ham-radio-operators-in-belarus-arrested-face-the-death-penalty/
2•cdrnsf•10m ago•0 comments

Show HN: Company hiring trends and insights from job postings

https://jobswithgpt.com/company-profiles/
1•sp1982•11m ago•0 comments

Building Google-Docs-like collaboration for an app used by millions (in Rust)

https://www.photoroom.com/inside-photoroom/building-google-docs-like-live-collaboration-for-a-cro...
2•ea016•11m ago•1 comments

Building a JavaScript runtime in one month

https://themackabu.dev/blog/js-in-one-month
1•theMackabu•11m ago•2 comments

Malicious ad blocker extension uses 'CrashFix' to spread new Python RAT

https://www.scworld.com/news/malicious-ad-blocker-extension-uses-crashfix-to-spread-new-python-rat
1•Bender•12m ago•0 comments

Show HN: RLM-MCP Analyze Files in Claude Code Using MIT's recursive LM paper

1•ahmedm24•13m ago•1 comments

Malan Chat, the full immersion AI-powered language learning app for 62 languages

https://www.malan.chat
2•sam_osterfeld•14m ago•1 comments

Email design after legacy Outlook is phased out

https://blocksedit.com/content-code/modern-email-design/
1•ovidem•14m ago•0 comments

Show HN: TetrisBench – AI vs. AI vs. Human Tetris using realtime code generation

https://tetrisbench.com/
1•ykhli•15m ago•0 comments

Tom Lehrer (1928–2025): A (Mostly) Mathematical Appreciation

https://www.ams.org/journals/notices/202602/noti3297/noti3297.html
2•stmw•15m ago•0 comments

Show HN: Building a future where security checks leave no permanent trails

1•csp_dev•17m ago•1 comments

Concorde at 50: Twice the speed of sound, twice the economic trouble

https://www.theregister.com/2026/01/21/50_years_concorde/
2•LorenDB•20m ago•0 comments

NoFap sues Pornhub under RICO Act, alleges global disinformation campaign

https://www.prnewswire.com/news-releases/nofap-sues-pornhub-under-rico-act-alleges-collusion-with...
2•jackfischer•21m ago•0 comments

Quantum physicists create largest ever 'superposition'

https://www.nature.com/articles/d41586-026-00177-9
3•digital55•21m ago•0 comments

You're Living in the Chinese Century

https://www.wired.com/china-issue/
5•mefengl•22m ago•0 comments

Apple's product plans stolen in Luxshare cyberattack

https://www.macrumors.com/2026/01/21/apple-product-plans-stolen-in-luxshare-cyberattack/
4•gloxkiqcza•22m ago•0 comments

illumos

https://illumos.org/
12•tosh•23m ago•0 comments

LWN Is Down

https://lwn.net/
1•WesolyKubeczek•24m ago•1 comments

My website is my custom feed reader

https://squeaki.sh/p/i-turned-my-website-into-my-feed-reader/
1•steffoz•25m ago•1 comments

MathGPT Graphing: fast interactive graphs with AI help

https://mathgpt.today/graphing
1•umeedsto•26m ago•2 comments

Show HN: A RSVP Reader for Articles

https://www.readfast.co/
1•Mitchell2398•26m ago•0 comments

Slouching Towards Bethlehem – Joan Didion (1967)

https://www.saturdayeveningpost.com/2017/06/didion/
1•jxmorris12•27m ago•0 comments

Attention Authors: updated endorsement policy

https://blog.arxiv.org/2026/01/21/attention-authors-updated-endorsement-policy/
1•50kIters•27m ago•0 comments

How do you keep AI-generated applications consistent as they evolve over time?

1•RobertSerber•29m ago•0 comments

Show HN: Why single agents suck at math proofs

https://ensue.dev/blog/stop-throwing-a-single-agent-at-complex-problems/
3•austinbaggio•33m ago•1 comments