frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

The Unreasonable Effectiveness of Reasonless Intermediate Tokens

https://arxiv.org/abs/2505.13775
4•YeGoblynQueenne•6mo ago

Comments

tocs3•6mo ago
I asked ChatGPT to restate this in more laymen's terms (posted below) and I am not to surprised at the answer.

"Lately, some AI models have shown impressive abilities to solve complex problems, and many people credit this to a method called Chain of Thought (CoT), where the model is trained to think through steps like a human might. In this paper, we take a closer look at that idea to see if it's really what's driving better performance.

We focus on the model’s step-by-step thinking (the words it generates along the way) — often treated like human "thoughts" — and examine whether these actually help the model solve problems more accurately. To test this, we train AI models using clean, correct step-by-step reasoning paths and final answers, all based on a known solving method (A* search). This lets us check both the final answers and the reasoning steps to see how they relate.

Interestingly, we find that even when a model gives the right answer, its reasoning steps can still be wrong or messy. To go further, we even train models using completely random and incorrect reasoning steps — and surprisingly, they still perform about the same, and sometimes even better, than those trained on correct steps.

This suggests that the step-by-step "thoughts" the model shows aren’t as meaningful or reliable as many assume. In short, just because a model looks like it’s reasoning through a problem doesn’t mean it actually is — and we should be careful not to treat its outputs as if it thinks like a human or follows strict logic."

Comparing AI Agents to Cybersecurity Professionals in Real-World Pen Testing

https://arxiv.org/abs/2512.09882
1•littlexsparkee•58s ago•0 comments

Marco Rubio bans Calibri font at State Department for being too DEI

https://techcrunch.com/2025/12/10/marco-rubio-bans-calibri-font-at-state-department-for-being-too...
1•rbanffy•2m ago•0 comments

Hyper-Scalers Are Using CXL to Lower the Impact of DDR5 Supply Constraints

https://www.servethehome.com/hyper-scalers-are-using-cxl-to-lower-the-impact-of-ddr5-supply-const...
1•rbanffy•4m ago•0 comments

Over 10k Docker Hub images found leaking credentials, auth keys

https://www.bleepingcomputer.com/news/security/over-10-000-docker-hub-images-found-leaking-creden...
3•todsacerdoti•5m ago•0 comments

Maybe AI is a regular platform shift

https://frontierai.substack.com/p/maybe-ai-is-a-regular-platform-shift
1•cgwu•6m ago•0 comments

GovSignals is solving government procurement using Trigger.dev

https://trigger.dev/customers/govsignals-customer-story
1•semicognitive•7m ago•0 comments

Huge undersea wall dating from 5000 BC found in France

https://www.bbc.com/news/articles/crk7lg1j146o
1•neversaydie•9m ago•0 comments

Rivian Unveils Custom Silicon, R2 Lidar Roadmap, and Universal Hands Free

https://riviantrackr.com/news/rivian-unveils-custom-silicon-r2-lidar-roadmap-universal-hands-free...
1•doctoboggan•9m ago•0 comments

GPT-5.2 System Card [pdf]

https://cdn.openai.com/pdf/3a4153c8-c748-4b71-8e31-aecbde944f8d/oai_5_2_system-card.pdf
4•synthwave•11m ago•0 comments

What was typography like in the Soviet Union?

https://bsky.app/profile/spavel.bsky.social/post/3m7nzfi2opk2t
1•zdw•11m ago•0 comments

Grid Congestion Bottlenecks U.K.'S Wind Power

https://spectrum.ieee.org/grid-congestion-uk
1•rbanffy•11m ago•0 comments

Explore over 400 national and state parks, monuments, and historic sites

https://parklookup.com
1•cranberryturkey•12m ago•0 comments

Vercel for Back End

https://px.app
1•sak84•12m ago•0 comments

Show HN: Brisk – Buy vs. Rent/Sell vs. Keep calculator tracking opportunity cost

https://manishrjain.com/brisk
1•mrjn•13m ago•0 comments

Rivian Gen 3 Autonomy

https://twitter.com/Rivian/status/1999170250328154389
3•kappi•13m ago•1 comments

GPT-5.2

https://openai.com/index/introducing-gpt-5-2/
32•meetpateltech•14m ago•4 comments

I implemented relative imports with Pyodide

https://evanhahn.com/pyodide-relative-imports/
1•speckx•14m ago•0 comments

OpenAI Launches GPT-5.2 as It Navigates 'Code Red'

https://www.wired.com/story/openai-gpt-launch-gemini-code-red/
3•thm•14m ago•0 comments

OpenAI calls GPT-5.2 the best model yet for professionals

https://www.theverge.com/ai-artificial-intelligence/842529/openai-gpt-5-2-new-model-chatgpt
2•WalterSobchak•14m ago•0 comments

Show HN: Built an attribution tool that uses Bayesian inference to track ROI

1•Raj7k•14m ago•0 comments

GPT-5.2

https://platform.openai.com/docs/models/gpt-5.2
3•amrrs•16m ago•1 comments

Show HN: Ship anything your coding agent can build

https://nexlayer.com/
1•amadosalsta•16m ago•0 comments

Regression Is All You Need

https://blog.tilderesearch.com/vignettes/regression
1•antipaul•18m ago•1 comments

A Disappearing Service Processor

https://oxide.computer/blog/cosmo-sp
1•bcantrill•19m ago•0 comments

Programmers and software developers lost the plot on naming their tools

https://larr.net/p/namings.html
5•todsacerdoti•20m ago•1 comments

Developers used 11.5B GitHub Actions minutes in open source projects

https://github.blog/news-insights/product-news/lets-talk-about-github-actions/
3•Link-•20m ago•0 comments

Content may violate our usage policies

1•Pocomon•21m ago•0 comments

GPT-5.2

https://platform.openai.com/docs/guides/latest-model
37•atgctg•22m ago•9 comments

The Capitalism Debate [video]

https://www.youtube.com/watch?v=xMNil06mODE
1•avonmach•22m ago•0 comments

Rampant U.S. Piracy Is a Multibillion-Dollar Concern for Manga Publishers

https://torrentfreak.com/rampant-u-s-piracy-is-a-multibillion-dollar-concern-for-japanese-manga-p...
3•t-3•23m ago•0 comments