frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

The Unreasonable Effectiveness of Reasonless Intermediate Tokens

https://arxiv.org/abs/2505.13775
4•YeGoblynQueenne•9mo ago

Comments

tocs3•9mo ago
I asked ChatGPT to restate this in more laymen's terms (posted below) and I am not to surprised at the answer.

"Lately, some AI models have shown impressive abilities to solve complex problems, and many people credit this to a method called Chain of Thought (CoT), where the model is trained to think through steps like a human might. In this paper, we take a closer look at that idea to see if it's really what's driving better performance.

We focus on the model’s step-by-step thinking (the words it generates along the way) — often treated like human "thoughts" — and examine whether these actually help the model solve problems more accurately. To test this, we train AI models using clean, correct step-by-step reasoning paths and final answers, all based on a known solving method (A* search). This lets us check both the final answers and the reasoning steps to see how they relate.

Interestingly, we find that even when a model gives the right answer, its reasoning steps can still be wrong or messy. To go further, we even train models using completely random and incorrect reasoning steps — and surprisingly, they still perform about the same, and sometimes even better, than those trained on correct steps.

This suggests that the step-by-step "thoughts" the model shows aren’t as meaningful or reliable as many assume. In short, just because a model looks like it’s reasoning through a problem doesn’t mean it actually is — and we should be careful not to treat its outputs as if it thinks like a human or follows strict logic."

Show HN: SafeClaw – Sleep-by-default AI assistant with runtime tool permissions

https://github.com/rawalrahul/safeclaw
1•rawaldelhi•41s ago•0 comments

"King Me": A Defense of King-Making in Board Game Design [video]

https://www.youtube.com/watch?v=UraJElx1ebg
1•euthymiclabs•1m ago•0 comments

Petri Nets as a Universal Abstraction

https://book.pflow.xyz/
1•orksliver•1m ago•1 comments

Show HN: I built a multi-agent Think Tank that calls out my bad decisions

https://github.com/dharmarajatulya1-hub/agent-think-tank
1•atulya_techtea•1m ago•0 comments

Ask HN: Why is YouTube's recommendation system so bad?

2•mr-pink•1m ago•0 comments

Mad: Watch agents do research live

https://briankitano.com/mad/
1•bkitano19•1m ago•0 comments

Show HN: Business Lead Finder – Scrape Google Maps and Yelp for Leads

https://apify.com/original_xenomorph/business-lead-finder
1•harborbuilds•1m ago•0 comments

Rust CLI Generate and validate .env files from one spec – self-documenting envs

https://crates.io/crates/envgen/1.0.0
1•SteveMorin•1m ago•0 comments

Show HN: Irondiff-Visual Config Diff for Cisco/Juniper/PfSense with Slack Alerts

https://irondiff.com
1•MattRos•2m ago•1 comments

Show HN: Telescope now queries Kubernetes logs directly

https://github.com/iamtelescope/telescope/releases/tag/v0.0.24
1•r0b3r4•2m ago•0 comments

The Century of the Maxxer

https://samkriss.substack.com/p/the-century-of-the-maxxer
1•wawayanda•2m ago•0 comments

Show HN: ViewLint – Lint UI, Not Code

https://github.com/EvanZhouDev/viewlint
1•EvanZhouDev•3m ago•0 comments

First public patch for Unreal Tournament 2004 in over 20 years

https://github.com/OldUnreal/UT2004Patches/releases
1•NKosmatos•7m ago•0 comments

OpenAI Mission Statement through the years

https://www.closedopenai.com/
1•eternalyxiii•10m ago•1 comments

Vanilla Light – Full Stack Web Framework

https://github.com/beachdevs/vanilla-light
1•dpweb•10m ago•0 comments

PostgreSQL Bloat Is a Feature, Not a Bug

https://rogerwelin.github.io/2026/02/11/postgresql-bloat-is-a-feature-not-a-bug/
1•birdculture•10m ago•0 comments

Dozens of Australians diagnosed with rare tattoo-related vision loss

https://www.abc.net.au/news/health/2026-02-14/tattoo-eye-inflammation/106315444
2•bookofjoe•12m ago•1 comments

KPMG partner fined over using AI to pass AI test

https://www.ft.com/content/c30ded60-bece-45e0-981d-653e1e3e9818
2•mmarian•12m ago•1 comments

Show HN: Personal AI Talent Agency for Content Creators

1•aa_y_ush•17m ago•0 comments

Conversations with AI: What I Learned About Myself

https://luisfernandoyt.makestudio.app/blog/878-conversations-with-ai
1•lout332•18m ago•0 comments

Debugging Kernel Oops

https://lfhernandez.com/posts/debugging-kernel-oops/
1•linolevan•20m ago•0 comments

Vercel-labs/portless: Replace port numbers with stable, named .localhost URLs

https://github.com/vercel-labs/portless
1•bdcravens•20m ago•0 comments

How (and why) we migrated to Tanstack from Next.js

https://www.inngest.com/blog/migrating-off-nextjs-tanstack-start
2•absarokafish•20m ago•1 comments

The singularity won't be gentle

https://www.natesilver.net/p/the-singularity-wont-be-gentle
2•softwaredoug•20m ago•0 comments

State of Show HN: 2025

https://blog.sturdystatistics.com/posts/show_hn/
2•kianN•22m ago•1 comments

Shifting structures in a software world dominated by AI

https://twitter.com/Thom_Wolf/status/2023387043967959138
1•bilsbie•22m ago•0 comments

Show HN: Skillaudit.sh – A minimalist security auditor for LLM skill definitions

https://skillaudit.sh/checks
1•dns•22m ago•0 comments

Pentagon reviewing Anthropic partnership over terms of use dispute

https://thehill.com/policy/defense/5740369-pentagon-anthropic-relationship-review/
1•c420•23m ago•0 comments

Fff.nvim – the first ever typo resistant code search

https://github.com/dmtrKovalenko/fff.nvim
1•neogoose•24m ago•1 comments

Dutch cops arrest man after sending him confidential files

https://www.theregister.com/2026/02/16/dutch_cops_breach/
2•OptionOfT•27m ago•0 comments