frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

The Unreasonable Effectiveness of Reasonless Intermediate Tokens

https://arxiv.org/abs/2505.13775
4•YeGoblynQueenne•1y ago

Comments

tocs3•1y ago
I asked ChatGPT to restate this in more laymen's terms (posted below) and I am not to surprised at the answer.

"Lately, some AI models have shown impressive abilities to solve complex problems, and many people credit this to a method called Chain of Thought (CoT), where the model is trained to think through steps like a human might. In this paper, we take a closer look at that idea to see if it's really what's driving better performance.

We focus on the model’s step-by-step thinking (the words it generates along the way) — often treated like human "thoughts" — and examine whether these actually help the model solve problems more accurately. To test this, we train AI models using clean, correct step-by-step reasoning paths and final answers, all based on a known solving method (A* search). This lets us check both the final answers and the reasoning steps to see how they relate.

Interestingly, we find that even when a model gives the right answer, its reasoning steps can still be wrong or messy. To go further, we even train models using completely random and incorrect reasoning steps — and surprisingly, they still perform about the same, and sometimes even better, than those trained on correct steps.

This suggests that the step-by-step "thoughts" the model shows aren’t as meaningful or reliable as many assume. In short, just because a model looks like it’s reasoning through a problem doesn’t mean it actually is — and we should be careful not to treat its outputs as if it thinks like a human or follows strict logic."

Typewriters and assistive technology for blind and partially sighted people

https://www.sciencemuseum.org.uk/objects-and-stories/everyday-wonders/typewriters-blind-partially...
1•adm4•24s ago•0 comments

Hegseth strikes female and Black Navy officers from promotion list

https://www.bostonglobe.com/2026/06/01/nation/hegseth-female-black-navy-officers-promotion/
2•petethomas•7m ago•0 comments

Xz, two years on: what scanners still cannot catch

https://arcis-website.pages.dev/blog/posts/xz-utils-and-the-trust-shift
1•gagancm•8m ago•0 comments

China launches rival rocket to SpaceX Falcon 9 for the first time

https://www.scientificamerican.com/article/china-launches-rival-to-spacex-falcon-9-with-zero-warn...
1•_____k•13m ago•0 comments

3D Printed Programmable Matter

https://www.timeplast.com
1•NoRagrets•14m ago•0 comments

Why I'm Turning to Textile to Think, Not Just Make

https://medium.com/the-bad-guy-index/why-im-turning-to-textile-to-think-not-just-make-c81f8f64fa7e
1•bryanrasmussen•16m ago•0 comments

The Filesystem Is the API (With TigerFS)

https://packagemain.tech/p/the-filesystem-is-the-api-with-tigerfs
1•FourSigma•17m ago•0 comments

Can You Get Rich Quick Off A.I. Slop?

https://www.nytimes.com/interactive/2026/06/01/magazine/ai-slop-viral-videos.html
2•1vuio0pswjnm7•18m ago•0 comments

Announcement and FAQ: Changes to GitHub Copilot Individual Plans

https://github.com/orgs/community/discussions/192963
1•blackbear_•24m ago•0 comments

Can we call it a robot?

https://in.miko.ai/products/miko-chess-grand
1•airwarmedd•24m ago•0 comments

How Online Sleuthing Helped Catch the 'Google Insider' on Polymarket

https://www.wsj.com/finance/currencies/how-online-sleuthing-helped-catch-the-google-polymarket-tr...
1•1vuio0pswjnm7•27m ago•0 comments

BreakShield CI – Detects Breaking API Changes in PRs Using AST

https://breakshield-ci.vercel.app
2•holesvojta•31m ago•0 comments

SlimTide Weight Loss Review: Worth Trying in 2026?

https://finance.yahoo.com/sectors/healthcare/articles/slimtide-capsules-shocking-report-2026-1724...
1•gardsapu•32m ago•0 comments

I Love Meta Platforms

https://chetan343.substack.com/p/i-love-meta
2•TheChetan•34m ago•0 comments

Xevdb – Query waveforms, RTL, and logs as one database (with optional AI)

https://github.com/aionhw/xevdb
1•bondanr•34m ago•0 comments

The Battle for the Future of Steam

https://www.youtube.com/watch?v=yV1_bWc0-gI
1•cable2600•37m ago•0 comments

Anthropic Expands Public Access to Claude Mythos AI Model

https://www.govinfosecurity.com/anthropic-expands-public-access-to-claude-mythos-ai-model-a-31778
3•divija_07•39m ago•1 comments

US now spending more on data center than on public transportation

https://bsky.app/profile/leftistwonk.bsky.social/post/3mnbd6txlus2c
2•kn81198•39m ago•0 comments

User-replaceable batteries are coming back in a big way

https://www.theverge.com/column/939026/user-replaceable-batteries-eu-european-union-legislation
2•1vuio0pswjnm7•41m ago•0 comments

App to find best model by filtering AI models

https://which-llm-model.netlify.app/
1•mzubairtahir•45m ago•0 comments

Coding agents are giving everyone decision fatigue

https://stackoverflow.blog/2026/05/21/coding-agents-are-giving-everyone-decision-fatigue/
2•jruohonen•45m ago•0 comments

Samurai City

https://worksinprogress.co/issue/samurai-city/
1•zdw•48m ago•0 comments

ShadowProtect: Wireshark for AI Agents

https://pypi.org/project/shadowprotect/0.1.0/
2•priyanshhhhh•48m ago•0 comments

Mellum2 Goes Open Source: A Fast Model for AI Workflows

https://blog.jetbrains.com/ai/2026/06/mellum2-goes-open-source-a-fast-model-for-ai-workflows/
6•microflash•52m ago•0 comments

Julia Http.jl 2.0

https://discourse.julialang.org/t/ann-http-jl-2-0-release-and-new-package-reseau-jl/137323
2•thetwentyone•59m ago•0 comments

Magnets Are Bad for Hardware Again

https://hackaday.com/2026/05/21/magnets-are-bad-for-hardware-again/
2•kristianpaul•1h ago•0 comments

VideoFDB: Evaluating Full-Duplex Vision-Speech Capabilities in Agents

https://research.nvidia.com/labs/amri/projects/video-fdb/
2•matt_d•1h ago•0 comments

Florida Sues OpenAI, Sam Altman: 'Utter Disregard for the Risk to Human Life'

https://variety.com/2026/biz/tech/florida-sues-openai-sam-altman-1236764066/
4•1vuio0pswjnm7•1h ago•1 comments

Self-Hosting on the Dark Web

https://david.alvarezrosa.com/posts/self-hosting-on-the-dark-web/
2•ethanplant•1h ago•0 comments

Claude Opus 4.8: The System Card

https://thezvi.substack.com/p/claude-opus-48-is-honestly-better
1•paulpauper•1h ago•0 comments