frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

The Unreasonable Effectiveness of Reasonless Intermediate Tokens

https://arxiv.org/abs/2505.13775
4•YeGoblynQueenne•8mo ago

Comments

tocs3•8mo ago
I asked ChatGPT to restate this in more laymen's terms (posted below) and I am not to surprised at the answer.

"Lately, some AI models have shown impressive abilities to solve complex problems, and many people credit this to a method called Chain of Thought (CoT), where the model is trained to think through steps like a human might. In this paper, we take a closer look at that idea to see if it's really what's driving better performance.

We focus on the model’s step-by-step thinking (the words it generates along the way) — often treated like human "thoughts" — and examine whether these actually help the model solve problems more accurately. To test this, we train AI models using clean, correct step-by-step reasoning paths and final answers, all based on a known solving method (A* search). This lets us check both the final answers and the reasoning steps to see how they relate.

Interestingly, we find that even when a model gives the right answer, its reasoning steps can still be wrong or messy. To go further, we even train models using completely random and incorrect reasoning steps — and surprisingly, they still perform about the same, and sometimes even better, than those trained on correct steps.

This suggests that the step-by-step "thoughts" the model shows aren’t as meaningful or reliable as many assume. In short, just because a model looks like it’s reasoning through a problem doesn’t mean it actually is — and we should be careful not to treat its outputs as if it thinks like a human or follows strict logic."

Jeffrey Epstein exchanged 447 emails with Elon Musk

https://twitter.com/JoshWalkos/status/2022031664399225089
1•doener•52s ago•0 comments

Trial of Glioblastoma Immunotherapy Advancement with Nivolumab and Relatlimab

https://clinicaltrials.gov/study/NCT06816927
1•femto•2m ago•1 comments

I vibed demo graphics creator for SoundCloud music

https://beatcanvas.net/
1•fsrc•6m ago•0 comments

Why Stripe paid $1B for Metronome instead of fixing Billing

https://getlago.com/blog/why-stripe-paid-1b-for-metronome-instead-of-fixing-billing
1•AnhTho_FR•8m ago•0 comments

Show HN: Upload App Store Screenshots Directly from Figma to App Store Connect

https://www.prioritycheck.in/
1•shtrsg•9m ago•0 comments

Harness engineering: leveraging Codex in an agent-first world

https://openai.com/index/harness-engineering/
1•martythemaniak•10m ago•0 comments

Show HN: Modeling "Dragon King" wildfire events with 5-mile frontier effects

https://gethazardsafe.com/dragon-king-problem
1•riscii68•14m ago•1 comments

Homeland Security Wants Social Media Sites to Expose Anti-ICE Accounts

https://www.nytimes.com/2026/02/13/technology/dhs-anti-ice-social-media.html
14•jjwiseman•14m ago•1 comments

Ask HN: Best GenAI image app UX with own API keys?

1•transitivebs•15m ago•0 comments

Show HN: Shot2 – Screenshots that don't waste your tokens (Free, OSS, MIT)

https://github.com/devadutta/shot2
1•vadepaysa•17m ago•0 comments

ArXiv preprint server clamps down on AI slop

https://www.science.org/content/article/arxiv-preprint-server-clamps-down-ai-slop
2•mindcrime•19m ago•1 comments

Plain Markdown winning over fancy forms is the biggest plot twist in dev tools

3•nuwansam_87•20m ago•1 comments

Quadlet as a First-Class Platform Primitive

http://ebourgess.dev/posts/podman-quadlet-production/
2•ebourgess•21m ago•0 comments

Show HN: ROX – a minimal language with explicit errors and no magic

https://roxlang.com/playground.html
1•hedayet•22m ago•0 comments

YouTube Launches on Apple Vision Pro

https://www.macrumors.com/2026/02/12/youtube-app-apple-vision-pro/
2•surprisetalk•23m ago•0 comments

Supercazzola – Generate spam for web scrapers

https://dacav.org/projects/supercazzola/
2•todsacerdoti•23m ago•0 comments

React Carousel component + source code

https://playzafiro.com/ui/components/carousel/
1•bartoszu_•25m ago•0 comments

Magic Work Cycle

https://tildeslash.com/magicworkcycle/
2•sovande•29m ago•1 comments

Show HN: Clovr – Generate structured Next.js front ends from a prompt

https://www.clovr.dev/
2•alby_churven•30m ago•0 comments

'Hidden' bugs in our gut appear key to good health, finds global study

https://www.cam.ac.uk/research/news/hidden-bugs-in-our-gut-appear-key-to-good-health-finds-global...
1•hhs•31m ago•0 comments

The evolution of OpenAI's mission statement

https://simonwillison.net/2026/Feb/13/openai-mission-statement/
30•coloneltcb•31m ago•7 comments

Former GitHub CEO raises record $60M dev tool seed round at $300M valuation

https://techcrunch.com/2026/02/10/former-github-ceo-raises-record-60m-dev-tool-seed-round-at-300m...
1•AnhTho_FR•34m ago•0 comments

The WhatsApp moment for money is here

https://www.ft.com/content/7b604dc2-5e9a-45bc-9711-0b1d3d7342fd
1•hhs•36m ago•0 comments

Crates.io's Freaky Friday

https://nesbitt.io/2026/02/06/cratesio-freaky-friday.html
2•todsacerdoti•37m ago•0 comments

Lunacy Web, online version of the desktop Figma alternative

https://www.lunacyapp.com/
1•denysonique•37m ago•0 comments

Open source USearch library jumpstarts ScyllaDB vector search

https://thenewstack.io/open-source-usearch-library-jumpstarts-scylladb-vector-search/
2•ashvardanian•39m ago•0 comments

Rod Dreher Thinks the Enlightenment Was a Mistake

https://www.theatlantic.com/magazine/2026/03/rod-dreher-religious-conservativism-jd-vance/685732/
2•breve•40m ago•0 comments

IBM Triples Entry Level Job Openings. Finds Limits to AI

https://fortune.com/2026/02/13/tech-giant-ibm-tripling-gen-z-entry-level-hiring-according-to-chro...
3•WhatsTheBigIdea•41m ago•0 comments

How to design fatigue resistance, make metal alloys more durable, sustainable

https://matse.illinois.edu/news/80920
1•hhs•42m ago•0 comments

Scaling Social Science Research

https://openai.com/index/scaling-social-science-research/
1•rorylawless•45m ago•0 comments