frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

The Unreasonable Effectiveness of Reasonless Intermediate Tokens

https://arxiv.org/abs/2505.13775
4•YeGoblynQueenne•10mo ago

Comments

tocs3•10mo ago
I asked ChatGPT to restate this in more laymen's terms (posted below) and I am not to surprised at the answer.

"Lately, some AI models have shown impressive abilities to solve complex problems, and many people credit this to a method called Chain of Thought (CoT), where the model is trained to think through steps like a human might. In this paper, we take a closer look at that idea to see if it's really what's driving better performance.

We focus on the model’s step-by-step thinking (the words it generates along the way) — often treated like human "thoughts" — and examine whether these actually help the model solve problems more accurately. To test this, we train AI models using clean, correct step-by-step reasoning paths and final answers, all based on a known solving method (A* search). This lets us check both the final answers and the reasoning steps to see how they relate.

Interestingly, we find that even when a model gives the right answer, its reasoning steps can still be wrong or messy. To go further, we even train models using completely random and incorrect reasoning steps — and surprisingly, they still perform about the same, and sometimes even better, than those trained on correct steps.

This suggests that the step-by-step "thoughts" the model shows aren’t as meaningful or reliable as many assume. In short, just because a model looks like it’s reasoning through a problem doesn’t mean it actually is — and we should be careful not to treat its outputs as if it thinks like a human or follows strict logic."

The Selfish Case for Doing Everything

https://roryflint.substack.com/p/the-selfish-case-for-doing-everything
1•mrroryflint•13s ago•0 comments

The invisible software supply chain

https://getlago.substack.com/p/embedded-software-is-the-biggest
1•FinnLobsien•21s ago•0 comments

Decisional vs. Performative Self: Why Decisions Don't Translate into Actions

https://www.leadingsapiens.com/performative-self-bandura/
1•sherilm•25s ago•0 comments

Silicon Valley's two biggest dramas have intersected: LiteLLM and Delve

https://techcrunch.com/2026/03/26/delve-did-the-security-compliance-on-litellm-an-ai-project-hit-...
1•kordlessagain•40s ago•0 comments

Show HN: ClawRun – Deploy AI agents to secure sandboxes with one command

https://clawrun.sh/?hn
1•afshinmeh•42s ago•0 comments

I built a social media platform that reads the mood of the internet

https://my-forum-9876f.web.app/
1•samreedcole•1m ago•1 comments

SQLite on Git: Why do we need random access in Git

https://blog.lysk.tech/sqlite-on-git-prologue
1•mlysk•2m ago•0 comments

The Energy Situation, Explained for Tech People

https://www.a16z.news/p/the-energy-situation-explained-for
1•7777777phil•4m ago•0 comments

Get Good at Math with Me

https://upinnovation.substack.com/p/math-for-physics-the-books-q1-2026
1•blakeb211•4m ago•1 comments

Musk has a plan to make human labor obsolete. Billionaires are joining in

https://www.washingtonpost.com/technology/2026/03/27/musk-optimus-robot-physical-ai/
2•ironyman•5m ago•1 comments

Mathematical Education – William P.Thruston(1990)

https://arxiv.org/pdf/math/0503081
1•nill0•8m ago•0 comments

Show HN: One BI platform for building reports, dashboards, and data exploration

https://github.com/flowkraft/reportburster
1•distributev•8m ago•0 comments

Flexible electrodynamic dust shields for lunar missions

https://www.sciencedirect.com/science/article/pii/S0094576526001153
1•PaulHoule•11m ago•0 comments

The danger of using "Hide My Email" with third-party Stripe handoffs

1•appstorelottery•11m ago•0 comments

Show HN: Aegis – Security framework for AI agents

https://acacian.github.io/aegis/playground/
1•Acacian•12m ago•0 comments

Mastercard agrees to buy stablecoin platform BVNK for up to $1.8B

https://www.coindesk.com/business/2026/03/17/mastercard-agrees-to-purchase-bvnk-for-up-to-usd1-8-...
2•vidyesh•13m ago•0 comments

Trends in Early-Stage Product Development: What Does the Endgame Look Like?

https://pawelbrodzinski.substack.com/p/10-trends-in-early-stage-product
1•flail•14m ago•0 comments

Executable specs: running Gherkin tests with Claude Code

https://github.com/mnapoli/exspec
1•mnapoli•14m ago•1 comments

VPS from Home with Public IP

https://dargo.net/
1•rootxy•14m ago•0 comments

Hong Kong Police Can Now Demand Phone Passwords Under New Security Rules

https://www.gadgetreview.com/hong-kong-police-can-now-demand-phone-passwords-under-new-security-r...
9•vidyesh•15m ago•0 comments

A LaTeX Beginner's Guide for the Age of AI

https://latexguide.org/
1•idle•16m ago•1 comments

AI glasses are catching on in China, from shopping to cheating

https://restofworld.org/2026/china-ai-glasses-cheating-privacy-boom/
2•Brajeshwar•16m ago•0 comments

Show HN: MyClawn – Claude Code networks for you on autopilot

https://www.myclawn.com
1•20vision•17m ago•0 comments

People inside Microsoft are fighting to drop mandatory Microsoft Account

https://www.windowscentral.com/microsoft/windows-11/people-inside-microsoft-are-fighting-to-drop-...
3•breve•18m ago•0 comments

Review the Outcome, Not the Output

https://matthewboston.com/blog/review-the-outcome-not-the-output/
3•bostonaholic•18m ago•0 comments

1M Tokens/s: Scaling Qwen 3.5 27B on 96 B200 GPUs with vLLM

https://medium.com/google-cloud/1-million-tokens-per-second-qwen-3-5-27b-on-gke-with-b200-gpus-16...
1•m4r1k•18m ago•0 comments

Show HN: A tool to create and evaluate document processing pipelines for RAG

https://ragbandit.com
1•martimchaves•19m ago•0 comments

Nantucket diner sells fifth million-dollar lottery prize in the last 2 years

https://nantucketcurrent.com/news/2-million-powerball-ticket-sold-at-old-south-diner-the-fifth-mi...
1•ilamont•19m ago•0 comments

Gianfranco used autoresearch to fix Gumroad's flaky tests in a week

https://twitter.com/gianfrancopiana/status/2037199814694228187
1•amberj•22m ago•0 comments

Installing a Let's Encrypt TLS Certificate on a Brother Printer with Certbot

https://owltec.ca/Other/Installing+a+Let%27s+Encrypt+TLS+certificate+on+a+Brother+printer+automat...
2•8organicbits•23m ago•0 comments