frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

The Unreasonable Effectiveness of Reasonless Intermediate Tokens

https://arxiv.org/abs/2505.13775
4•YeGoblynQueenne•9mo ago

Comments

tocs3•9mo ago
I asked ChatGPT to restate this in more laymen's terms (posted below) and I am not to surprised at the answer.

"Lately, some AI models have shown impressive abilities to solve complex problems, and many people credit this to a method called Chain of Thought (CoT), where the model is trained to think through steps like a human might. In this paper, we take a closer look at that idea to see if it's really what's driving better performance.

We focus on the model’s step-by-step thinking (the words it generates along the way) — often treated like human "thoughts" — and examine whether these actually help the model solve problems more accurately. To test this, we train AI models using clean, correct step-by-step reasoning paths and final answers, all based on a known solving method (A* search). This lets us check both the final answers and the reasoning steps to see how they relate.

Interestingly, we find that even when a model gives the right answer, its reasoning steps can still be wrong or messy. To go further, we even train models using completely random and incorrect reasoning steps — and surprisingly, they still perform about the same, and sometimes even better, than those trained on correct steps.

This suggests that the step-by-step "thoughts" the model shows aren’t as meaningful or reliable as many assume. In short, just because a model looks like it’s reasoning through a problem doesn’t mean it actually is — and we should be careful not to treat its outputs as if it thinks like a human or follows strict logic."

Anthropic vs. the Administration: What Happens When Firms Push Back

https://joycevance.substack.com/p/anthropic-sues-the-administration
1•taskset•23s ago•0 comments

Logicplanes vs. Kaeso – better tech brand?

https://logicplanes.com/
1•devinoldenburg•30s ago•1 comments

Show HN: Detect almost any object in an image using a text prompt

https://www.useful-ai-tools.com/tools/detect-anything/
1•eyasu6464•45s ago•0 comments

Show HN: Canopy – A kid-friendly Plex client for iOS

https://canopykids.app/
1•ashlance•1m ago•0 comments

IdeaRank – Startup Analysis Engine

1•TMDev•1m ago•0 comments

CSS Naked Day 2020

https://meyerweb.com/eric/thoughts/2020/04/09/css-naked-day-2020/
1•theandrewbailey•1m ago•1 comments

Make anything AI-ready. AI-ready in 30 seconds

https://vinkius.com/en
1•renatomarinho•1m ago•0 comments

AI embeddings linearly encode their own accuracy

https://devlogs.lgnd.ai/posts/2026-03-01-self-aware-embeddings/
1•brunosan•2m ago•1 comments

I've made an iOS client for OpenCode AI

https://github.com/martynpekala/openlens-qr
1•martini_bambini•2m ago•1 comments

Apple's New MacBooks Have a Keyboard Change You Might Have Missed

https://www.macrumors.com/2026/03/10/macbook-keyboard-change/
1•ksec•4m ago•0 comments

Systemantics: How systems work and especially how they fail

https://en.wikipedia.org/wiki/Systemantics
1•pramodbiligiri•6m ago•0 comments

Don't Be a Sucker – Other Title:Educational Film, No. 6

https://catalog.archives.gov/id/24376
1•SockThief•9m ago•1 comments

Kanban Code – The IDE for 2026

https://github.com/langwatch/kanban-code
1•jangletown•9m ago•0 comments

Nix on macOS – The Good, the Bad and the Ugly

https://drakerossman.com/blog/nix-on-macos-the-good-the-bad-and-the-ugly
1•wrxd•10m ago•0 comments

Directory for mail clubs and subscription boxes

https://www.findmailclubs.com/
1•creativedee•11m ago•1 comments

Anyone using Cloudflare Workflows in production?

1•mertguvencli•13m ago•0 comments

The Impact of a Large Number of API Features

https://apichangelog.substack.com/p/the-impact-of-a-large-number-of-api
1•mariuz•13m ago•0 comments

Show HN: Agent-triage – diagnosis of agent failures from production traces

https://github.com/converra/agent-triage
1•oren1531•14m ago•0 comments

Agent-debate – AI agents review code by editing a shared Markdown file

https://github.com/gumbel-ai/agent-debate
2•marutiagarwal•16m ago•1 comments

Show HN: Engagement Experiment – Venmo vs. Cash App

https://wrpolls.com
1•wr639•17m ago•0 comments

The day the father of computing Federico Faggin described my universe

https://evertonb.substack.com/p/the-day-the-father-of-computing-federico
1•EvertonB•19m ago•1 comments

Sliceland [Game]

https://strangestloop.io/sliceland/
1•tasshin•22m ago•0 comments

Google's Data Center Buildout Could Top $1T

https://www.forbes.com/sites/richardnieva/2026/03/02/googles-data-center-buildout-could-top-1-tri...
1•bookofjoe•22m ago•0 comments

What Pages Should You Monitor on a Competitor Website?

https://adversa.io/blog/what-pages-should-you-monitor-on-a-competitor-website/
1•robinweller•23m ago•0 comments

Microsoft BitNet: 100B Param 1-Bit model for local CPUs

https://github.com/microsoft/BitNet
7•redm•24m ago•0 comments

Simplifying Expressions in SymPy

https://www.johndcook.com/blog/2026/03/10/simplifying-expressions-in-sympy/
2•ibobev•25m ago•0 comments

Uprooted – voices of student homelessness comic

https://smu-uprooted.squarespace.com
1•rathboma•25m ago•0 comments

Inverted U-shaped relationship between sleep duration and phenotypic age

https://www.nature.com/articles/s41598-024-56316-7
1•RickJWagner•25m ago•1 comments

The Axiom of Univalence

https://bartoszmilewski.com/2026/03/10/the-axiom-of-univalence/
1•ibobev•27m ago•0 comments

Examples for the tcpdump and dig man pages

https://jvns.ca/blog/2026/03/10/examples-for-the-tcpdump-and-dig-man-pages/
1•ibobev•27m ago•0 comments