frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

The Unreasonable Effectiveness of Reasonless Intermediate Tokens

https://arxiv.org/abs/2505.13775
4•YeGoblynQueenne•1y ago

Comments

tocs3•1y ago
I asked ChatGPT to restate this in more laymen's terms (posted below) and I am not to surprised at the answer.

"Lately, some AI models have shown impressive abilities to solve complex problems, and many people credit this to a method called Chain of Thought (CoT), where the model is trained to think through steps like a human might. In this paper, we take a closer look at that idea to see if it's really what's driving better performance.

We focus on the model’s step-by-step thinking (the words it generates along the way) — often treated like human "thoughts" — and examine whether these actually help the model solve problems more accurately. To test this, we train AI models using clean, correct step-by-step reasoning paths and final answers, all based on a known solving method (A* search). This lets us check both the final answers and the reasoning steps to see how they relate.

Interestingly, we find that even when a model gives the right answer, its reasoning steps can still be wrong or messy. To go further, we even train models using completely random and incorrect reasoning steps — and surprisingly, they still perform about the same, and sometimes even better, than those trained on correct steps.

This suggests that the step-by-step "thoughts" the model shows aren’t as meaningful or reliable as many assume. In short, just because a model looks like it’s reasoning through a problem doesn’t mean it actually is — and we should be careful not to treat its outputs as if it thinks like a human or follows strict logic."

Berlin, Israeli defence giant IAI sign deal for defence innovation hub

https://www.euractiv.com/news/berlin-israeli-defence-giant-iai-sign-deal-for-aerospace-and-defenc...
1•doener•4m ago•0 comments

Scott Pelley Shows How Legacy Media Got It Wrong – and Bari Weiss Made It Worse

https://theintercept.com/2026/06/11/bari-weiss-scott-pelley-60-minutes-cbs-news/
2•hn_acker•5m ago•1 comments

'News' Site Keeps Hallucinating EFF Staffers

https://www.eff.org/deeplinks/2026/06/news-site-keeps-hallucinating-eff-staffers
1•hn_acker•5m ago•0 comments

A Functional Taxonomy of World Models

https://drfeifei.substack.com/p/a-functional-taxonomy-of-world-models
2•gmays•9m ago•0 comments

Bringing Hilton's loyalty program onto a secure AWS platform

https://www.justaftermidnight247.com/case-study/hilton-premium-club-japan/
1•mooreds•9m ago•0 comments

ICE Officers Break Cameras. Cops Steal Them. Welcome to New Jersey

https://www.techdirt.com/2026/06/11/ice-officers-break-cameras-cops-steal-them-welcome-to-new-jer...
3•hn_acker•9m ago•0 comments

SpaceX officially prices shares at $135 in the largest IPO ever

https://techcrunch.com/2026/06/11/spacex-officially-prices-shares-at-135-in-the-largest-ipo-ever/
2•7777777phil•9m ago•0 comments

Euro-Office 1.0 arrives to FOSS infighting: 'Compatibility is not sovereignty'

https://www.zdnet.com/article/euro-office-is-here-libreoffice-supporters-arent-happy/
3•devonnull•9m ago•0 comments

Silent Android feature scans photos for 'sensitive content' – how to uninstall

https://www.zdnet.com/article/android-safetycore-scans-phone-photos-for-sensitive-content-how-to-...
1•josephcsible•9m ago•0 comments

Show HN: A self-hosted market-research tool with publicly verifiable security

https://atlas.freedomcore.io/
1•MaverickAtlas•12m ago•0 comments

What Would It Look Like If the AI Bubble Popped?

https://www.thebignewsletter.com/p/what-would-it-look-like-if-the-ai
1•chmaynard•14m ago•0 comments

Is Your Writing Yours?

https://personalaisafety.com/p/is-your-writing-yours
1•sofiaqt•16m ago•1 comments

Mu

https://mu.social/
2•doener•16m ago•0 comments

Precision Matters in Block Scales

https://constantinides.net/2026/06/11/precision-matters-in-block-scales/
1•matt_d•17m ago•0 comments

UX Collective: The Prompt is not an Interface

https://uxdesign.cc/the-prompt-is-not-an-interface-41b77277681d
1•valisvalis•17m ago•0 comments

OpenAI Prepping for On-Prem Product?

https://ledger.somantix.ai/posts/open-ai-lays-groundwork-for-on-prem-product/
1•bdroopy•17m ago•0 comments

Data Science Weekly – Issue 655

https://datascienceweekly.substack.com/p/data-science-weekly-issue-655
1•sebg•19m ago•0 comments

Building an AI-Friendly Company, Just in Case AI Takes over the World

https://www.paxerp.com/blog/ai-friendly-company-docs
1•robeym•22m ago•0 comments

Agents' Last Exam

https://arxiv.org/abs/2606.05405
1•matt_d•23m ago•0 comments

DiffusionGemma: Discrete diffusion in a large language model

https://idlemachines.co.uk/topics/trending
2•smaddrellmander•24m ago•0 comments

Does the Harness Matter? Lessons from Ale-Claw on Agents' Last Exam

https://agents-last-exam.org/blogs/harness-matters
2•matt_d•25m ago•0 comments

Codex for Open Source

https://openai.com/form/codex-for-oss/
2•EvgeniyZh•25m ago•0 comments

Sam Altman's eye-scanning startup [Worldcoin parent] is laying off employees

https://www.businessinsider.com/sam-altman-orb-worldcoin-tools-for-humanity-layoffs-2026-6
2•toomuchtodo•26m ago•1 comments

Google Bikes?

https://www.republicbike.com/google_bikes.asp?hl=en-GB
1•joebig•26m ago•0 comments

Show HN: Diffcat – a TUI for delightful Git diffs

https://github.com/trebaud/diffcat
1•trebaud•26m ago•0 comments

Germany's 'HS2' delayed for five years after engineering blunder

https://www.telegraph.co.uk/world-news/2026/06/10/germanys-hs2-delayed-five-years-engineering-blu...
1•ExpertAdvisor01•26m ago•1 comments

The unreasonable effectiveness of simple HTML

https://shkspr.mobi/blog/2021/01/the-unreasonable-effectiveness-of-simple-html/
10•luispa•28m ago•0 comments

Show HN: Vera – open-source tool to self-audit smart contracts with AI

https://vera.apostro.xyz/
1•roofloor•30m ago•0 comments

We Had to Ban 65 CTF Teams to Get a Top 10 Leaderboard

https://camel4.dev/posts/byuctf-2026/
1•joshmoody24•31m ago•0 comments

The same SQLite query returns different results in Bun and Node

https://github.com/andrewitsover/blog/blob/main/sqlite.md
1•andrewitsover•31m ago•0 comments