frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

The Unreasonable Effectiveness of Reasonless Intermediate Tokens

https://arxiv.org/abs/2505.13775
4•YeGoblynQueenne•9mo ago

Comments

tocs3•9mo ago
I asked ChatGPT to restate this in more laymen's terms (posted below) and I am not to surprised at the answer.

"Lately, some AI models have shown impressive abilities to solve complex problems, and many people credit this to a method called Chain of Thought (CoT), where the model is trained to think through steps like a human might. In this paper, we take a closer look at that idea to see if it's really what's driving better performance.

We focus on the model’s step-by-step thinking (the words it generates along the way) — often treated like human "thoughts" — and examine whether these actually help the model solve problems more accurately. To test this, we train AI models using clean, correct step-by-step reasoning paths and final answers, all based on a known solving method (A* search). This lets us check both the final answers and the reasoning steps to see how they relate.

Interestingly, we find that even when a model gives the right answer, its reasoning steps can still be wrong or messy. To go further, we even train models using completely random and incorrect reasoning steps — and surprisingly, they still perform about the same, and sometimes even better, than those trained on correct steps.

This suggests that the step-by-step "thoughts" the model shows aren’t as meaningful or reliable as many assume. In short, just because a model looks like it’s reasoning through a problem doesn’t mean it actually is — and we should be careful not to treat its outputs as if it thinks like a human or follows strict logic."

From Logistic Regression to AI

https://www.johndcook.com/blog/2026/03/04/from-logistic-regression-to-ai/
1•ibobev•40s ago•0 comments

An AI Odyssey, Part 2: Prompting Peril

https://www.johndcook.com/blog/2026/03/04/an-ai-odyssey-part-2-prompting-peril/
1•ibobev•47s ago•0 comments

The Existence, Impact, and Origin of Hallucination

https://www.youtube.com/redirect?event=video_description&redir_token=QUFFLUhqbUY1U3VrYVNVMDF1V1Fp...
1•neilellis•3m ago•0 comments

Moving towards purpose driven AI

https://blog.ynzi.com/2026/03/moving-towards-purpose-driven-ai.html
1•yusufnb•4m ago•0 comments

Godot maintainers say they're drowning in AI-generated PRs

https://www.pcgamer.com/software/platforms/open-source-game-engine-godot-is-drowning-in-ai-slop-c...
1•tahazsh•4m ago•0 comments

Show HN: Resume Matcher – Tailor your resumes with job descriptions

https://github.com/srbhr/Resume-Matcher
1•srbhr•4m ago•0 comments

Iranian strikes on Amazon data centers highlight industry's vulnerability

https://www.sfgate.com/news/world/article/iranian-strikes-on-amazon-data-centers-highlights-21952...
1•boringg•5m ago•0 comments

Lessons for Japan's scientific community from Trump's America

https://www.japantimes.co.jp/commentary/2026/02/18/japan/trump-lessons-for-japans-scientific-comm...
1•PaulHoule•5m ago•0 comments

MCP Tool Design Is Not API Design

https://mage-bench.com/blog/2026-03-04-mcp-tool-design-is-not-api-design/
1•GregorStocks•6m ago•0 comments

Security with Vibe Coding Platforms

1•Reva25•6m ago•0 comments

Codex app now available on Windows

https://developers.openai.com/codex/app/windows/
1•meetpateltech•6m ago•0 comments

PageIndex (19k stars) scored 44% on legal docs. Same as vector RAG

https://medium.com/@TheWake/three-rag-architectures-one-legal-document-25-needles-none-found-more...
1•metawake•7m ago•0 comments

Turning web runs into scripts with Codex

https://www.nibzard.com/cashout
1•nkko•7m ago•0 comments

The Space Race's Forgotten Theme Park

https://daily.jstor.org/the-space-races-forgotten-theme-park/
1•anarbadalov•7m ago•0 comments

Brewdog founder admits 'many mistakes' as hundreds lose jobs in sale

https://www.bbc.co.uk/news/articles/cze00ddyw27o
1•mellosouls•8m ago•0 comments

Agentic commerce won't kill cards, but it will open a gap

https://a16zcrypto.substack.com/p/agentic-commerce-wont-kill-cards
1•7777777phil•9m ago•0 comments

History of Scientific Glass

https://www.asimov.press/p/glass
1•mailyk•10m ago•0 comments

Codex for Windows

https://apps.microsoft.com/detail/9plm9xgg6vks?hl=en-US&gl=US
1•crorella•10m ago•1 comments

Show HN: FileShot – zero-knowledge file sharing, 50GB/file free, no paywalls

https://fileshot.io/
1•GraysoftDev•14m ago•0 comments

NanoGPT Slowrun: Language Modeling with Limited Data, Infinite Compute

https://qlabs.sh/slowrun
3•sdpmas•14m ago•0 comments

Downdetector and Speedtest sold to Accenture for $1.2B

https://www.theverge.com/tech/889234/downdetector-ookla-speedtest-sold-accenture
1•awkwardpotato•14m ago•0 comments

Show HN: Sanctuary for the most beautiful sentences, curated by people

https://www.letterquote.io/
1•wanderinglight•15m ago•0 comments

Father sues Google, claiming Gemini chatbot drove son into fatal delusion

https://techcrunch.com/2026/03/04/father-sues-google-claiming-gemini-chatbot-drove-son-into-fatal...
4•speckx•15m ago•0 comments

Show HN: ChessWoodie – structured chess tactics training

https://www.chesswoodie.com/
1•ghmaster•16m ago•0 comments

The Iran War's Most Precious Commodity Isn't Oil, It's Desalinated Water

https://www.bloomberg.com/opinion/articles/2026-03-04/iran-war-the-most-precious-commodity-is-wat...
3•ck2•18m ago•0 comments

Pure Independence

https://collabfund.com/blog/pure-independence/
1•herbertl•20m ago•0 comments

Stop Rebuilding Front End Apps for Environment Variables (REP RFC)

1•olamide226•20m ago•1 comments

Console Inbox

https://www.console.com/blog/inbox-ai-service-desk/
1•gk1•20m ago•0 comments

Distributed Systems Simulator

https://paperdraw.dev/
1•eminemence•21m ago•1 comments

Show HN: I improved my handwritten math OCR (now preserves derivations)

https://www.useaxiomnotes.com/app
1•mrajatnath•21m ago•1 comments