frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

The Unreasonable Effectiveness of Reasonless Intermediate Tokens

https://arxiv.org/abs/2505.13775
4•YeGoblynQueenne•7mo ago

Comments

tocs3•7mo ago
I asked ChatGPT to restate this in more laymen's terms (posted below) and I am not to surprised at the answer.

"Lately, some AI models have shown impressive abilities to solve complex problems, and many people credit this to a method called Chain of Thought (CoT), where the model is trained to think through steps like a human might. In this paper, we take a closer look at that idea to see if it's really what's driving better performance.

We focus on the model’s step-by-step thinking (the words it generates along the way) — often treated like human "thoughts" — and examine whether these actually help the model solve problems more accurately. To test this, we train AI models using clean, correct step-by-step reasoning paths and final answers, all based on a known solving method (A* search). This lets us check both the final answers and the reasoning steps to see how they relate.

Interestingly, we find that even when a model gives the right answer, its reasoning steps can still be wrong or messy. To go further, we even train models using completely random and incorrect reasoning steps — and surprisingly, they still perform about the same, and sometimes even better, than those trained on correct steps.

This suggests that the step-by-step "thoughts" the model shows aren’t as meaningful or reliable as many assume. In short, just because a model looks like it’s reasoning through a problem doesn’t mean it actually is — and we should be careful not to treat its outputs as if it thinks like a human or follows strict logic."

A Conversation with Ken Williams (Sierra Online / Cave) [video]

https://www.youtube.com/watch?v=2KRK0nVoodE
1•sgt•5m ago•0 comments

API Impact Tracker – Know which API clients you'll break before deploying

https://github.com/aj9704845-code/api-impact-tracker
1•PeterDS•5m ago•1 comments

Annual Reboot: 52 Questions to Reflect and Reset

https://elacrain.com/writing/annual-review/
1•simonebrunozzi•7m ago•0 comments

1D-Pong Game at 39C3

https://github.com/ogermer/1d-pong
1•oger•7m ago•1 comments

Ask HN: Is there a point to maintaining distant connections on social media?

1•Desafinado•9m ago•0 comments

70 person relay of Super Mario 64 [video]

https://www.youtube.com/watch?v=r-VFPEh59k4
2•HelloUsername•9m ago•0 comments

WhatsApp Is Untrustable

https://toki.la/posts/whatsapp
2•todsacerdoti•10m ago•0 comments

Ask HN: Moved to SF to build a startup. What next?

1•kvaranasi_•11m ago•0 comments

Mitigating the Billion Dollar Mistake

https://www.gingerbill.org/article/2026/01/11/mitigating-the-billion-dollar-mistake/
1•PKop•12m ago•0 comments

Show HN: A mobile-first React share sheet with native sharing

https://sharesheet.gwendall.com
2•ges•17m ago•0 comments

Ask HN: Tips for getting the ROM for an old speech synthesizer?

2•ctoth•17m ago•0 comments

Show HN: Karmic Tail Calculator – A Destiny Matrix Patterns

https://karmictail.net
1•lion__93332•17m ago•0 comments

Show HN: I forced Apple to admit a "Product Issue" using AI and CIA principles

https://medium.com/@ryu360i/when-authorization-breaches-availability-analyzing-the-27-2kb-icloud-...
1•ryuzaburo•18m ago•0 comments

PluriSnake gameplay [Sun Jan 11, 2026 puzzle] – Beta available [video]

https://www.youtube.com/watch?v=JAjd5HgbOhU
1•amichail•20m ago•1 comments

Ask HN: What was the best sci-fi book of 2025?

5•Erikun•21m ago•0 comments

I mapped out how debugging works during production incidents

https://nemorize.com/roadmaps/debugging-under-pressure
1•reverseblade2•21m ago•1 comments

Desperately Seeking Squircles (2018)

https://www.figma.com/blog/desperately-seeking-squircles/
2•kjeetgill•22m ago•0 comments

Show HN: AI Vibe Coding Hackathon

https://vibe.devpost.com
2•abdibrokhim•23m ago•0 comments

NCSA Mosaic 2.7, one of the first graphical web browsers

https://github.com/alandipert/ncsa-mosaic
1•stmw•25m ago•0 comments

guys why does armenian completely break Claude

https://twitter.com/dyushag/status/1993143599286886525
19•ag8•27m ago•6 comments

Systematically generating tests that would have caught Anthropic's top‑K bug

https://theorem.dev/blog/anthropic-bug-test/
2•jasongross•28m ago•0 comments

Sampling at negative temperature

https://cavendishlabs.org/blog/negative-temperature/
14•ag8•29m ago•1 comments

Show HN: Sunshine Optimist: Optimistic takes on daylight and sunset times

https://sunshineoptimist.com
1•willj•30m ago•0 comments

Worldview – persistent strategic context for Claude Code

https://www.extremeclarity.ai/worldview
1•faizanbhat•30m ago•1 comments

The Machinery of Terror

https://chrishedges.substack.com/p/the-machinery-of-terror
1•chmaynard•31m ago•0 comments

QR Spaces – One QR and custom domain to share all your links

3•iamgaazi•31m ago•2 comments

The Subtle Injury – Being pretty good

https://tevonsb.com/thoughts/subtle-injury/
2•tevon•31m ago•1 comments

From fragmented code to consistent output with AI rules

https://www.stromcapital.fi/blog/cursor-rules
1•ronistrom•32m ago•0 comments

Why (We Don't Need To?) Care About Debt-to-GDP?

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5271557
1•neehao•33m ago•0 comments

Show HN: A MCP for controlling terminal UI apps built with bubbletea and ratatui

https://github.com/michaellee8/mcp-tui-server
2•michaellee8•35m ago•0 comments