frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

The Unreasonable Effectiveness of Reasonless Intermediate Tokens

https://arxiv.org/abs/2505.13775
4•YeGoblynQueenne•7mo ago

Comments

tocs3•7mo ago
I asked ChatGPT to restate this in more laymen's terms (posted below) and I am not to surprised at the answer.

"Lately, some AI models have shown impressive abilities to solve complex problems, and many people credit this to a method called Chain of Thought (CoT), where the model is trained to think through steps like a human might. In this paper, we take a closer look at that idea to see if it's really what's driving better performance.

We focus on the model’s step-by-step thinking (the words it generates along the way) — often treated like human "thoughts" — and examine whether these actually help the model solve problems more accurately. To test this, we train AI models using clean, correct step-by-step reasoning paths and final answers, all based on a known solving method (A* search). This lets us check both the final answers and the reasoning steps to see how they relate.

Interestingly, we find that even when a model gives the right answer, its reasoning steps can still be wrong or messy. To go further, we even train models using completely random and incorrect reasoning steps — and surprisingly, they still perform about the same, and sometimes even better, than those trained on correct steps.

This suggests that the step-by-step "thoughts" the model shows aren’t as meaningful or reliable as many assume. In short, just because a model looks like it’s reasoning through a problem doesn’t mean it actually is — and we should be careful not to treat its outputs as if it thinks like a human or follows strict logic."

Project Mariner: Research prototype exploring future of human-agent interaction

https://deepmind.google/models/project-mariner/
1•Garbage•52s ago•0 comments

Substack Network error = security content they don't allow to be sent

https://simonwillison.net/2025/Dec/28/substack-network-error/
1•thunderbong•2m ago•0 comments

Divide and conquer roger ailes documentary

https://www.youtube.com/watch?v=2g3L2Bi-QpA
1•marysminefnuf•5m ago•0 comments

Picomon 0.2.0: From AMD Crash Fix to GPU Monitoring That Doesn't Suck

https://omarkama.li/blog/picomon-amd-nvidia-apple-silicon-gpu-monitoring
1•omneity•5m ago•0 comments

Calendar

https://neatnik.net/calendar/?year=2026
5•twapi•11m ago•0 comments

Show HN: One-click PPTX to PNG (Windows app and Python library)

https://github.com/Water-Run/pptx2png
2•WaterRun•27m ago•0 comments

Show HN: AI slop has flooded the template market

2•VBproDev•27m ago•0 comments

Zuckerberg settles $8B lawsuit over Cambridge Analytica scandal, avoids

https://www.france24.com/en/americas/20250717-zuckerberg-settles-8-billion-lawsuit-over-cambridge...
2•latein•31m ago•1 comments

C –> Java != Java –> LLM

http://www.observationalhazard.com/2025/12/c-java-java-llm.html
4•WoodenChair•32m ago•0 comments

Digital Independence Day

https://di.day/
3•mxx•36m ago•1 comments

Merry Christmas Day Have a MongoDB Security Incident

https://doublepulsar.com/merry-christmas-day-have-a-mongodb-security-incident-9537f54289eb
1•882542F3884314B•38m ago•0 comments

New York City Tree Map

https://tree-map.nycgovparks.org/
2•wh313•40m ago•0 comments

Travel agents took 10 years to collapse, developers are three years in

https://martinalderson.com/posts/travel-agents-developers/
5•jnord•42m ago•1 comments

Minnesota Fraud documentary is top Twitter / X video of all time

https://twitter.com/nickshirleyy/status/2004642794862961123
2•monero-xmr•44m ago•0 comments

Show HN: I analyzed 50 directories to see what makes money

https://directoryideas.ai/directory-trends-report
1•tejas3732•44m ago•0 comments

A Decline in Churchgoing Led to a Rise in 'Deaths of Despair'

https://studyfinds.org/churches-kept-americans-alive-states-made-a-decision/
4•pfrrp•52m ago•2 comments

Talk about Cooperation

https://lee-notion-blog-psi.vercel.app/article/2d63e9e4-833e-802d-b2bc-cf3213802693
2•MuziLee•56m ago•0 comments

Where scrollbars are clicked, and why [pdf]

https://link.springer.com/article/10.1186/s41235-024-00551-z
2•thunderbong•58m ago•0 comments

Rust Errors Without Dependencies

https://vincents.dev/blog/rust-errors-without-dependencies/
1•vsgherzi•1h ago•0 comments

Poor Charlie's Almanack: The Essential Wit and Wisdom of Charles T. Munger

https://www.stripe.press/poor-charlies-almanack
1•vinhnx•1h ago•1 comments

Gertrude Stein Style Training

https://muratcankoylan.com/projects/gertrude-stein-style-training/
2•vuciv•1h ago•1 comments

Show HN: Supabase Auth Site – An out-of-the-box auth site powered by Supabase

https://github.com/saltbo/supabase-auth-site
1•saltbo•1h ago•0 comments

Show HN: Laravel Brick Money Package

https://github.com/devhammed/laravel-brick-money
1•Horlahcoded•1h ago•1 comments

Boris Cherny on Claude Code a Year In

https://twitter.com/bcherny/status/2004887829252317325
1•doppp•1h ago•1 comments

Show HN: PineCone – A bundler for splitting PineScript into multiple files

2•claudianadalin•1h ago•0 comments

Skill for Vue/React refactoring driven by VHO analysis

https://github.com/zcf0508/vue-hook-optimizer/blob/master/packages/mcp/refactor_prompt_en.md
1•huali•1h ago•0 comments

Show HN: Relay – Connect Claude Desktop and Claude Code via MCP

https://github.com/mhcoen/mcp-relay
1•mhcoen•1h ago•0 comments

How Booking.com Works

https://www.booking.com/content/how_we_work.en-gb.html
1•nomilk•1h ago•0 comments

An AI pioneer says the technology is 'limited' and won't replace humans soon

https://www.nbcnews.com/tech/innovation/andrew-ng-says-ai-limited-wont-replace-humans-anytime-soo...
3•nis0s•1h ago•2 comments

Dialtone – AOL 3.0 Server

https://dialtone.live/
3•rickcarlino•1h ago•1 comments