frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

The Unreasonable Effectiveness of Reasonless Intermediate Tokens

https://arxiv.org/abs/2505.13775
4•YeGoblynQueenne•8mo ago

Comments

tocs3•8mo ago
I asked ChatGPT to restate this in more laymen's terms (posted below) and I am not to surprised at the answer.

"Lately, some AI models have shown impressive abilities to solve complex problems, and many people credit this to a method called Chain of Thought (CoT), where the model is trained to think through steps like a human might. In this paper, we take a closer look at that idea to see if it's really what's driving better performance.

We focus on the model’s step-by-step thinking (the words it generates along the way) — often treated like human "thoughts" — and examine whether these actually help the model solve problems more accurately. To test this, we train AI models using clean, correct step-by-step reasoning paths and final answers, all based on a known solving method (A* search). This lets us check both the final answers and the reasoning steps to see how they relate.

Interestingly, we find that even when a model gives the right answer, its reasoning steps can still be wrong or messy. To go further, we even train models using completely random and incorrect reasoning steps — and surprisingly, they still perform about the same, and sometimes even better, than those trained on correct steps.

This suggests that the step-by-step "thoughts" the model shows aren’t as meaningful or reliable as many assume. In short, just because a model looks like it’s reasoning through a problem doesn’t mean it actually is — and we should be careful not to treat its outputs as if it thinks like a human or follows strict logic."

What's New in Pandas 3.0: Expressions, Copy-on-Write, and Faster Strings

https://codecut.ai/pandas-3-whats-new/
1•rbanffy•31s ago•0 comments

Include-Base-44 European Languages Benchmark Leaderboard

https://huggingface.co/spaces/speakleash/include-base-european-leaderboard
1•taubek•1m ago•0 comments

LLMs and Your Career

https://notes.eatonphil.com/2026-01-19-llms-and-your-career.html
1•SouravInsights•1m ago•0 comments

Building Your Own Efficient uint128 in C++

https://solidean.com/blog/2026/building-your-own-u128/
1•todsacerdoti•1m ago•0 comments

AI-Generated Faces Fool Most People, but Photo Training Improves Detection

https://petapixel.com/2025/12/29/ai-generated-faces-fool-most-people-but-photo-training-improves-...
1•PaulHoule•3m ago•0 comments

Show HN: Driftcheck – Pre-push hook that catches doc/code drift with LLMs

https://github.com/deichrenner/driftcheck
1•deichrenner•3m ago•0 comments

Sandvault: Run AI agents isolated in a sandboxed macOS user account

https://github.com/webcoyote/sandvault
1•Luc•4m ago•0 comments

Show HN: Automating Type Safety for Mission-Critical Industrial Systems

https://www.stackbuilders.com/case-studies/plow-technologies-automating-type-safety-at-scale-for-...
1•StackBuilders•5m ago•0 comments

Vision: A Computational Investigation [pdf]

https://people.ciirc.cvut.cz/~hlavac/pub/MiscTextForStudents/1982MarrDavidVisionBook.pdf
1•foster_nyman•9m ago•0 comments

Anti-Coercion Instrument

https://en.wikipedia.org/wiki/Anti-Coercion_Instrument
2•kelseyfrog•9m ago•0 comments

Operational AI Governance and the Runtime Decision Ownership Gap

1•vivekanandsingh•10m ago•0 comments

The challenges of soft delete

https://atlas9.dev/blog/soft-delete.html
2•buchanae•11m ago•0 comments

Are 'tech dense' farms the future of farming?

https://www.bbc.com/news/articles/c78e4l3rm22o
1•rmason•12m ago•0 comments

Inside the secret world of Japanese snack bars

https://www.bbc.com/travel/article/20260116-inside-the-secret-world-of-japanese-snack-bars
2•rmason•14m ago•0 comments

Ralph, too, needs a test train split

https://softwaredoug.com/blog/2026/01/17/ai-coding-needs-test-train-splits
1•JnBrymn•16m ago•0 comments

Building a fast highly-configurable Rust-based backtesting system

https://nexustrade.io/blog/building-a-lightning-fast-highly-configurable-rust-based-backtesting-s...
1•austin-starks•19m ago•1 comments

Show HN: WP-CLI and Abilities API for Wordfence

https://github.com/trueqap/wpcli-for-wordfence
1•justinde•20m ago•0 comments

Davos Live: Canadian PM Mark Carney Speaks at World Economic Forum

https://www.youtube.com/watch?v=dE981Z_TaVo
3•consumer451•20m ago•2 comments

Renault to team up with French defence group to make drones for Ukraine

https://www.defensenews.com/global/europe/2026/01/20/french-carmaker-renault-to-produce-long-rang...
4•megalomanu•22m ago•0 comments

Ask HN: How many of you are using Spotify APIs for their applications?

https://community.spotify.com/t5/Spotify-for-Developers/bd-p/Spotify_Developer
1•cipz•22m ago•1 comments

Show HN: ElkDesk – I rage-quit Zendesk and built my own

https://elkdesk.com
2•julianpeters•24m ago•0 comments

Show HN: I'd love feedback on my Markdown-to-LinkedIn Carousel webapp (desktop)

https://inslide.malvik.de
1•svenmalvik•25m ago•0 comments

Systemd and AI

https://devpoga.org/systemd-ai/
1•kianN•25m ago•0 comments

Show HN: Agent Skills Leaderboard

https://skills.sh
1•andrewqu•26m ago•0 comments

A Metabolic Workspace

https://www.joanwestenberg.com/a-metabolic-workspace/
1•bookofjoe•26m ago•0 comments

FastMCP 3.0

https://www.jlowin.dev/blog/fastmcp-3
11•jlowin•27m ago•0 comments

Show HN: AI Vibe Coding Hackathon $500k+ in prizes

https://vibe.devpost.com
1•abdibrokhim•27m ago•0 comments

Typography on Pencils (2023)

https://www.presentandcorrect.com/blogs/blog/typography-on-pencils-1-5
2•NaOH•30m ago•0 comments

Ask HN: I need feedback for AI driven dashboard for embedded analytics

https://querypanel.io/auth/login
1•civancza•30m ago•2 comments

Calculus by L.V. Tarasov: A Socratic Dialogue (1982) [pdf]

https://archive.org/details/TarasovCalculus
2•vitaelabitur•31m ago•1 comments