frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

The Unreasonable Effectiveness of Reasonless Intermediate Tokens

https://arxiv.org/abs/2505.13775
4•YeGoblynQueenne•1y ago

Comments

tocs3•1y ago
I asked ChatGPT to restate this in more laymen's terms (posted below) and I am not to surprised at the answer.

"Lately, some AI models have shown impressive abilities to solve complex problems, and many people credit this to a method called Chain of Thought (CoT), where the model is trained to think through steps like a human might. In this paper, we take a closer look at that idea to see if it's really what's driving better performance.

We focus on the model’s step-by-step thinking (the words it generates along the way) — often treated like human "thoughts" — and examine whether these actually help the model solve problems more accurately. To test this, we train AI models using clean, correct step-by-step reasoning paths and final answers, all based on a known solving method (A* search). This lets us check both the final answers and the reasoning steps to see how they relate.

Interestingly, we find that even when a model gives the right answer, its reasoning steps can still be wrong or messy. To go further, we even train models using completely random and incorrect reasoning steps — and surprisingly, they still perform about the same, and sometimes even better, than those trained on correct steps.

This suggests that the step-by-step "thoughts" the model shows aren’t as meaningful or reliable as many assume. In short, just because a model looks like it’s reasoning through a problem doesn’t mean it actually is — and we should be careful not to treat its outputs as if it thinks like a human or follows strict logic."

The end of the engineer. The rise of the operator

https://getoperators.ai/manifesto
1•pro_methe5•1m ago•0 comments

Memory in the Age of AI Agents (Survey Paper)

https://arxiv.org/abs/2512.13564
1•thoughtpeddler•3m ago•0 comments

"It's Hard to Eval" Is a Product Smell

https://hamel.dev/blog/posts/eval-smell/
2•call-me-al•8m ago•0 comments

Snap to AI – One-Keystroke Screenshots to Claude, ChatGPT, etc. (macOS)

https://snaptoai.app
1•threeten•12m ago•0 comments

Free Google Docs Resume Templates to Copy and Edit – ResumeDocs

https://googledocsresumetemplate.com/google-docs-resume-template/
1•Hardd•15m ago•0 comments

AI Story Generator for Game Masters – Free D&D and RPG Tools

https://aistorygenerator.work
1•Hardd•15m ago•0 comments

Show HN: Drifty – AI Focus agent shuts down distractions tabs while you work

https://drifty.so/
1•Ari_Shin•17m ago•0 comments

Zuckerberg Urges Meta to Explore Working with Polymarket and Kalshi

https://www.nytimes.com/2026/06/26/technology/zuckerberg-meta-polymarket-kalshi.html
3•CaptainZapp•19m ago•1 comments

Cost calculators for common home renovation projects

https://costto.build/
1•way007•28m ago•0 comments

NASA's X-59 "frankenjet" tests supersonic flight without the sonic boom

https://arstechnica.com/gadgets/2026/06/nasas-x-59-frankenjet-tests-supersonic-flight-without-the...
2•joak•44m ago•0 comments

Warren Buffett skips donation to Gates Foundation amid Epstein review

https://www.aol.com/articles/warren-buffett-skips-donation-gates-012320000.html
2•1vuio0pswjnm7•46m ago•0 comments

Revisiting: Stack pivot, W^X break – in the context of PixelSmash

https://www.mail-archive.com/misc@openbsd.org/msg198341.html
1•gibletz•47m ago•0 comments

Snapcompact: SoTA Compaction – Instant, Local, Free

https://blog.can.ac/2026/06/10/snapcompact/
1•handfuloflight•50m ago•0 comments

Positive and Negative Time Flows in the Toronto Experiment on the 4/3πC Formula

https://medium.com/@f9121212/topological-derivation-of-geometric-boundaries-for-positive-and-nega...
2•ortrich•52m ago•0 comments

Exiled Chinese Tycoon Gets 30 Years in Prison for Billion-Dollar Fraud

https://www.bloomberg.com/news/articles/2026-06-29/exiled-chinese-tycoon-guo-gets-30-years-in-us-...
1•1vuio0pswjnm7•54m ago•0 comments

Silicon Valley Is Obsessed with 'Trust Stacking,' and the IRS Doesn't Like It

https://www.wsj.com/personal-finance/taxes/silicon-valley-is-obsessed-with-trust-stacking-and-the...
2•apparent•54m ago•3 comments

Loko Scheme 0.13.0

https://weinholt.se/articles/loko-scheme-0-13-0/
1•azhenley•55m ago•0 comments

Instatic is a modern self-hosted visual CMS

https://github.com/CoreBunch/Instatic
1•danboarder•56m ago•0 comments

Students are doing worse than you think

https://www.economist.com/international/2026/06/25/students-are-doing-worse-than-you-think
1•andsoitis•59m ago•0 comments

Is there a Mario Wii Web port?

1•Itzsplicez•1h ago•1 comments

T-Mobile Just Ripped 8M Customers Off Their Grandfathered Plans

https://www.gadgetreview.com/t-mobile-just-ripped-8-million-customers-off-their-grandfathered-pla...
1•momentmaker•1h ago•0 comments

GitHub profiles turned into FIFA Ultimate Team cards, rated out of 99

https://gitfut.com
3•beatthatflight•1h ago•0 comments

Magicbookshelf.org – A Spoiler Free Companion – The Brothers Karamazov

https://magicbookshelf.org/read/the-brothers-karamazov/
2•pfwitt•1h ago•0 comments

Why AI is like a (Clever Hans) Horse [video]

https://www.youtube.com/watch?v=0GQ2RP-25gM
2•tartoran•1h ago•0 comments

Should every baby's DNA be sequenced?

https://www.economist.com/science-and-technology/2026/06/29/should-every-babys-dna-be-sequenced
1•andsoitis•1h ago•0 comments

Prism: An Impure Functional Language with Typed Effects

https://www.stephendiehl.com/posts/prism/
1•ghc•1h ago•0 comments

SRAM as Processing

https://prawns.dev/til/processing-using-sram
1•random__duck•1h ago•0 comments

Open Domesday

https://opendomesday.org/
1•mellosouls•1h ago•0 comments

Moonshot AI (kimi) launches a credit card

https://www.kimi.com/aicard
2•danieltanfh95•1h ago•0 comments

Chinese tycoon sentenced to 30 years in US jail

https://www.bbc.com/news/articles/cjeg15vw3z9o
2•tartoran•1h ago•3 comments