frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

The Unreasonable Effectiveness of Reasonless Intermediate Tokens

https://arxiv.org/abs/2505.13775
4•YeGoblynQueenne•7mo ago

Comments

tocs3•7mo ago
I asked ChatGPT to restate this in more laymen's terms (posted below) and I am not to surprised at the answer.

"Lately, some AI models have shown impressive abilities to solve complex problems, and many people credit this to a method called Chain of Thought (CoT), where the model is trained to think through steps like a human might. In this paper, we take a closer look at that idea to see if it's really what's driving better performance.

We focus on the model’s step-by-step thinking (the words it generates along the way) — often treated like human "thoughts" — and examine whether these actually help the model solve problems more accurately. To test this, we train AI models using clean, correct step-by-step reasoning paths and final answers, all based on a known solving method (A* search). This lets us check both the final answers and the reasoning steps to see how they relate.

Interestingly, we find that even when a model gives the right answer, its reasoning steps can still be wrong or messy. To go further, we even train models using completely random and incorrect reasoning steps — and surprisingly, they still perform about the same, and sometimes even better, than those trained on correct steps.

This suggests that the step-by-step "thoughts" the model shows aren’t as meaningful or reliable as many assume. In short, just because a model looks like it’s reasoning through a problem doesn’t mean it actually is — and we should be careful not to treat its outputs as if it thinks like a human or follows strict logic."

Patela v2: From Certificates to Hardware

https://osservatorionessuno.org/blog/2025/12/patela-v2-from-certificates-to-hardware/
1•todsacerdoti•24s ago•0 comments

LLVM: The Bad Parts

https://www.npopov.com/2026/01/11/LLVM-The-bad-parts.html
1•zdw•1m ago•0 comments

Subformer: Multilingual video dubbing with speaker diarization and voice cloning

https://subformer.com/en-US
1•mashreghi•1m ago•1 comments

Show HN: `tc` like `wc` but for LLM tokens

https://github.com/jamierpond/tokencount
1•jamiepond•2m ago•0 comments

Training an LLM to Play Diplomacy with RL

https://www.benglickenhaus.com/blog/diplomacy_rl_part_1
1•pawalt•3m ago•0 comments

Show HN: An LLM-optimized programming language

https://github.com/ImJasonH/ImJasonH/blob/main/articles/llm-programming-language.md
2•ImJasonH•6m ago•0 comments

Data Trust (or lack of it) is many paper cuts, not one BIG error

https://www.rudderstack.com/blog/data-trust-clickstream-discrepancy/
1•soumyadeb•6m ago•0 comments

Ask HN: How to automate aesthetic photo cropping? (CV/AI)

1•icons•7m ago•0 comments

Debian goes retro with a spatial desktop that time forgot

https://www.theregister.com/2026/01/09/desktop_classic_system/
2•mmphosis•15m ago•0 comments

Official TypeScript Cheat Sheets

https://www.typescriptlang.org/cheatsheets/
1•doodlesdev•18m ago•0 comments

Himalayas bare and rocky after reduced winter snowfall, scientists warn

https://www.bbc.com/news/articles/clyndv7zd20o
3•koolhead17•20m ago•0 comments

Rethinking Helix

https://asta.boserup.eu/forest/rethinking-helix/
1•todsacerdoti•21m ago•0 comments

What the actual science says about "brain rot" [video]

https://www.youtube.com/watch?v=tdIUMkXxtHg
1•mgh2•22m ago•0 comments

Hepatic adaptation to chronic metabolic stress primes tumorigenesis

https://www.cell.com/cell/fulltext/S0092-8674(25)01366-2
1•PaulHoule•26m ago•0 comments

Code is cheap now, but software isn't

https://www.chrisgregori.dev/opinion/code-is-cheap-now-software-isnt
4•fs_software•27m ago•0 comments

Malaysia and Indonesia block Musk's Grok over sexually explicit deepfakes

https://www.bbc.com/news/articles/cg7y10xm4x2o
3•breve•27m ago•0 comments

Show HN: The Thiele Machine – Coq-Verified Computational Model Beyond Turing

https://github.com/sethirus/The-Thiele-Machine
3•nwthiele•31m ago•0 comments

Barista: Serving up fresh stats for your Claude Code sessions

https://github.com/pstuart/pstuart/tree/main/barista
1•handfuloflight•32m ago•0 comments

Show HN: LifeOps – Relationship intelligence for developers (local-first)

https://github.com/senguttuvang/LifeOps-CLI
1•seng•32m ago•0 comments

CES 2026: "Worst in Show" – Calling Out Gadgets That Make Things Worse

https://www.ifixit.com/News/115344/worst-in-show-returns-at-ces-2026-calling-out-gadgets-that-mak...
2•gnabgib•33m ago•0 comments

Can Shawn Levy Resuscitate "Star Wars?"

https://www.nytimes.com/2026/01/08/movies/shawn-levy-star-wars-stranger-things.html
4•bookofjoe•40m ago•1 comments

Silent Rebuilds: Keeping Container CVE Counts Near-Zero

https://www.bretfisher.com/silent-rebuilds/
1•ropable•59m ago•1 comments

Anthropic bans xAI from using Claude in Cursor

https://xcancel.com/kyliebytes/status/2009686466746822731
6•Palmik•1h ago•1 comments

Federal prosecutors open criminal investigation into the Fed and Jerome Powell

https://www.cnn.com/2026/01/11/business/federal-prosecutors-criminal-investigation-federal-reserv...
11•washedup•1h ago•1 comments

Critical Analysis of Air Up's Scientific Marketing Claims

https://zenodo.org/records/18197315
2•OrthoBottle•1h ago•1 comments

Uncrossy

https://uncrossy.com/
9•dgacmu•1h ago•3 comments

Show HN: Constela – Build web pages using JSON instead of JavaScript

https://github.com/yuuichieguchi/constela
1•yuu1ch13•1h ago•0 comments

Which programming languages are most token-efficient?

https://martinalderson.com/posts/which-programming-languages-are-most-token-efficient/
23•tehnub•1h ago•10 comments

Fed Chair Powell says he's under criminal investigation

https://www.cnbc.com/2026/01/12/fed-jerome-powell-criminal-probe-nyt.html
9•victor106•1h ago•1 comments

Men Who Are Super Competitive About Sleep

https://www.wsj.com/style/fashion/competitive-about-sleep-gear-sleepwear-76761041
3•lxm•1h ago•1 comments