frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

The Unreasonable Effectiveness of Reasonless Intermediate Tokens

https://arxiv.org/abs/2505.13775
4•YeGoblynQueenne•6mo ago

Comments

tocs3•6mo ago
I asked ChatGPT to restate this in more laymen's terms (posted below) and I am not to surprised at the answer.

"Lately, some AI models have shown impressive abilities to solve complex problems, and many people credit this to a method called Chain of Thought (CoT), where the model is trained to think through steps like a human might. In this paper, we take a closer look at that idea to see if it's really what's driving better performance.

We focus on the model’s step-by-step thinking (the words it generates along the way) — often treated like human "thoughts" — and examine whether these actually help the model solve problems more accurately. To test this, we train AI models using clean, correct step-by-step reasoning paths and final answers, all based on a known solving method (A* search). This lets us check both the final answers and the reasoning steps to see how they relate.

Interestingly, we find that even when a model gives the right answer, its reasoning steps can still be wrong or messy. To go further, we even train models using completely random and incorrect reasoning steps — and surprisingly, they still perform about the same, and sometimes even better, than those trained on correct steps.

This suggests that the step-by-step "thoughts" the model shows aren’t as meaningful or reliable as many assume. In short, just because a model looks like it’s reasoning through a problem doesn’t mean it actually is — and we should be careful not to treat its outputs as if it thinks like a human or follows strict logic."

Insecure Data Storage in IoT Smart Lock App

https://blog.ptidej.net/understanding-insecure-data-storage-in-iot-smart-lock-companion-app/
1•yann-gael•1m ago•1 comments

Chainguard: 1,800 trusted container images to eliminate your vulnerabilities

https://www.chainguard.dev
1•doener•2m ago•0 comments

Big tech is creating its own media bubble to 'win the narrative battle online'

https://www.theguardian.com/technology/2025/nov/29/big-tech-silicon-valley-ceo-media
2•1659447091•4m ago•0 comments

Dilution vs. Risk taking: Capital gains taxes and entrepreneurs

https://www.nber.org/papers/w34512
3•hhs•6m ago•0 comments

Harmonic's automated theorem prover Aristotle solves open Erdős problem in Lean

https://www.erdosproblems.com/forum/thread/124#post-1892
5•mathfan•6m ago•0 comments

White House launches website to excoriate media for 'biased' stories

https://www.theguardian.com/us-news/2025/nov/29/white-house-media-website-trump
1•1659447091•8m ago•0 comments

The long wait is over, Ganymede has arrived

https://endeavouros.com/news/the-long-wait-is-over-ganymede-has-arrived/
1•doener•8m ago•0 comments

Nlmixr2, an R-based OSS challenger to NONMEM/Monolix/Phoenix, joins R Consortium

https://r-consortium.org/posts/nlmixr2-is-becoming-an-r-consortium-working-group/
1•ionychal•12m ago•0 comments

Scala

https://www.huygens-fokker.org/scala/
2•onestay42•14m ago•0 comments

Leonardo shows Michelangelo, an AI missile shield for Europe

https://ukdefencejournal.org.uk/leonardo-shows-michelangelo-an-ai-missile-shield-for-europe/
1•jshprentz•14m ago•0 comments

Why do most new languages fail? (2012)

https://pointersgonewild.com/2012/06/07/why-do-most-new-languages-fail/
1•azhenley•15m ago•0 comments

Indonesia resists US trade deal 'poison pill'

https://www.ft.com/content/64d27052-a434-4e81-9321-87216eecf99c
3•hhs•15m ago•0 comments

Goodbye, Price Tags. Hello, Dynamic Pricing

https://www.nytimes.com/2025/11/28/opinion/dynamic-pricing-algorithms.html
4•apparent•17m ago•1 comments

Show HN: I Wrote a Field Manual on Self-Hosting(Immich,ZFS,Docker)Free on Kindle

https://www.amazon.com/dp/B0FY3XXPNV
1•devmicrosystems•30m ago•0 comments

Make It Easy for Humans

https://tombedor.dev/make-it-easy-for-humans/
1•jjfoooo4•31m ago•0 comments

Gemini Apps limits and upgrades for Google AI subscribers

https://support.google.com/gemini/answer/16275805?hl=en
1•doener•31m ago•0 comments

Compiler Explorer now supports Racket

https://godbolt.org/z/z3WffbzaY
1•azhenley•32m ago•0 comments

It's mathematically highly likely that there is life elsewhere in the universe

https://www.sciencedirect.com/science/article/pii/S0094576525006599?via%3Dihub
4•Rogach•34m ago•4 comments

Token Visualizer

https://github.com/PeterHdd/token-visualization
1•peterhddcoding•34m ago•1 comments

Zenroom – No-code cryptographic virtual machine

https://zenroom.org/
1•smartmic•44m ago•1 comments

94% zero-shot in a shifting gridworld, no retraining

1•heavymemory•53m ago•0 comments

Mint Is Not TeX

https://mint.ubavic.rs/
3•ubavic•54m ago•2 comments

The Fastest Image Diffing Engine You've Never Heard Of

https://vizzly.dev/blog/honeydiff-vs-odiff-pixelmatch-benchmarks/
2•Robdel12•56m ago•0 comments

Eraser: A Dynamic Data Race Detector for Multithreaded Programs (1997) [pdf]

https://web.stanford.edu/class/archive/cs/cs240/cs240.1054/readings/Tocs97.pdf
1•todsacerdoti•59m ago•0 comments

He Wants a New Start. So He Is Taking the Hardest Driving Test in the World

https://www.nytimes.com/2025/11/24/world/europe/london-black-cab-taxi-driving-test.html
1•bookofjoe•1h ago•1 comments

Get Your Kid a Watch

https://www.theatlantic.com/technology/2025/11/smartwatch-kids-screen-time/684975/
5•fortran77•1h ago•1 comments

Pinball Shopify

https://bfcm.shopify.com/
3•SnaKeZ•1h ago•0 comments

Americans no longer see four-year college degrees as worth the cost

https://www.nbcnews.com/politics/politics-news/poll-dramatic-shift-americans-no-longer-see-four-y...
38•jnord•1h ago•29 comments

Memory-Graph – Knowledge Graph Memory for Claude Code with SQLite/Neo4j/Memgraph

https://github.com/gregorydickson/memory-graph
2•gregorydickson•1h ago•1 comments

Nobara Project: Fedora Linux with user-friendly fixes added to it

https://nobaraproject.org/
2•doener•1h ago•0 comments