news newest ask show jobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

The Unreasonable Effectiveness of Reasonless Intermediate Tokens

https://arxiv.org/abs/2505.13775

4•YeGoblynQueenne•1mo ago

Comments

tocs3•1mo ago

I asked ChatGPT to restate this in more laymen's terms (posted below) and I am not to surprised at the answer.

"Lately, some AI models have shown impressive abilities to solve complex problems, and many people credit this to a method called Chain of Thought (CoT), where the model is trained to think through steps like a human might. In this paper, we take a closer look at that idea to see if it's really what's driving better performance.

We focus on the model’s step-by-step thinking (the words it generates along the way) — often treated like human "thoughts" — and examine whether these actually help the model solve problems more accurately. To test this, we train AI models using clean, correct step-by-step reasoning paths and final answers, all based on a known solving method (A* search). This lets us check both the final answers and the reasoning steps to see how they relate.

Interestingly, we find that even when a model gives the right answer, its reasoning steps can still be wrong or messy. To go further, we even train models using completely random and incorrect reasoning steps — and surprisingly, they still perform about the same, and sometimes even better, than those trained on correct steps.

This suggests that the step-by-step "thoughts" the model shows aren’t as meaningful or reliable as many assume. In short, just because a model looks like it’s reasoning through a problem doesn’t mean it actually is — and we should be careful not to treat its outputs as if it thinks like a human or follows strict logic."

European leaders announce new equity fund for Ukraine and urge investment

https://www.latimes.com/world-nation/story/2025-07-10/european-leaders-announce-new-equity-fund-for-ukraine-and-urge-investment-even-as-war-accelerates

1•Bluestein•2m ago•0 comments

Chrome's hidden X-Browser-Validation header reverse engineered

https://github.com/dsekz/chrome-x-browser-validation-header

1•dsekz•2m ago•1 comments

A Bicycle for the Mind

https://jamesgurney.substack.com/p/a-bicycle-for-the-mind

1•Balgair•6m ago•0 comments

Why Textile Brands Need Supply Chain Traceability

https://everycred.com/blog/textile-supply-chain-traceability/

1•ethanleetech•8m ago•1 comments

Stress is wrecking your health: how can science help?

https://www.nature.com/articles/d41586-025-02066-z

1•bookofjoe•8m ago•1 comments

Show HN: Intermittent Fasting Calculator – Plan Meals and Fasting Times

https://intermittentfastingcalculator.org/

1•MatthewTKD•9m ago•0 comments

Study: Apple's newest AI model flags health conditions with up to 92% accuracy

https://9to5mac.com/2025/07/10/study-apple-ai-model-flags-health-conditions-with-up-to-92-accuracy/

1•mgh2•16m ago•0 comments

Jack Dorsey says his 'secure' new Bitchat app has not been tested for security

https://techcrunch.com/2025/07/09/jack-dorsey-says-his-secure-new-bitchat-app-has-not-been-tested-for-security/

1•jrflowers•18m ago•0 comments

Give Me Some Advice

2•bigbaldhead•24m ago•4 comments

My Journey to Build a Working Tesla Coil

https://sandman2127.github.io/design/Tesla_Coil/

1•1970-01-01•26m ago•0 comments

Track Work, Progress and Performance Instantly – Zero Manual Updates

2•heyitsapu•36m ago•0 comments

Show HN: Looking for Beta Testers: Run AI-Generated Code in AgentSphere Sandbox

1•AgentSphere•38m ago•0 comments

Concurrent Programming with Harmony

https://harmony.cs.cornell.edu/book/

1•todsacerdoti•40m ago•0 comments

Show HN: Intuitive Layout Image Generation Prompt Generator

https://rymajp.com/ipgen

1•acdev•41m ago•0 comments

Nerve pain drug gabapentin linked to increased dementia, cognitive impairment

https://medicalxpress.com/news/2025-07-nerve-pain-drug-gabapentin-linked.html

1•clumsysmurf•48m ago•0 comments

Netflix Tudum Architecture: From CQRS with Kafka to CQRS with Raw Hollow

https://netflixtechblog.com/netflix-tudum-architecture-from-cqrs-with-kafka-to-cqrs-with-raw-hollow-86d141b72e52

2•soheilpro•49m ago•0 comments

Budget limits at DHS delayed FEMA's Texas deployment

https://www.washingtonpost.com/climate-environment/2025/07/10/fema-texas-flooding-dhs-search-rescue/

3•KnuthIsGod•56m ago•0 comments

The first intelligent screenshot tool of the AI era

https://github.com/zhushen12580/smart-screenshot

2•zane12580•58m ago•0 comments

Hard Usernames for Games Generator

https://hardusernames.com/en/hard-usernames-for-games

1•labubulive•58m ago•0 comments

The Egos at id (Software)

https://www.marclaidlaw.com/the-egos-at-id/

1•neko_ranger•1h ago•0 comments

'Autofocus' specs promise sharp vision, near or far

https://www.bbc.com/news/articles/cj6r06d7xdjo

7•tagawa•1h ago•0 comments

Tool strips away anti-AI protections from digital art

https://www.technologyreview.com/2025/07/10/1119937/tool-strips-away-anti-ai-protections-from-digital-art/

1•gnabgib•1h ago•0 comments

A Poor Man's User Study with a Vision Model and E[P]

https://twitter.com/johnjhorton/status/1943473769219002766

1•john_horton•1h ago•0 comments

Extreme Low-Bit Clustering for Large Language Models via Knowledge Distillation

https://arxiv.org/abs/2506.12038

1•PaulHoule•1h ago•0 comments

Grok 4 seems to consult Elon Musk to answer controversial questions

https://techcrunch.com/2025/07/10/grok-4-seems-to-consult-elon-musk-to-answer-controversial-questions/

15•mkeeter•1h ago•2 comments

America's largest power grid is struggling to meet demand from AI

https://www.reuters.com/sustainability/boards-policy-regulation/americas-largest-power-grid-is-struggling-meet-demand-ai-2025-07-09/

1•qwikhost•1h ago•0 comments

Show HN: Open-Source Alternative to Mercury

https://github.com/different-ai/zero-finance

1•ben_talent•1h ago•0 comments

Psilocybin treatment extends cellular lifespan, improves survival of aged mice

https://www.nature.com/articles/s41514-025-00244-x

22•pseudolus•1h ago•3 comments

Supporting kernel development with large language models

https://lwn.net/Articles/1026558/

1•signa11•1h ago•0 comments

Flickle – connect any two actors via movies in ≤6 guesses

https://flickle.carpoolgames.net

6•kanoacook•1h ago•1 comments