frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

The Unreasonable Effectiveness of Reasonless Intermediate Tokens

https://arxiv.org/abs/2505.13775
4•YeGoblynQueenne•1y ago

Comments

tocs3•1y ago
I asked ChatGPT to restate this in more laymen's terms (posted below) and I am not to surprised at the answer.

"Lately, some AI models have shown impressive abilities to solve complex problems, and many people credit this to a method called Chain of Thought (CoT), where the model is trained to think through steps like a human might. In this paper, we take a closer look at that idea to see if it's really what's driving better performance.

We focus on the model’s step-by-step thinking (the words it generates along the way) — often treated like human "thoughts" — and examine whether these actually help the model solve problems more accurately. To test this, we train AI models using clean, correct step-by-step reasoning paths and final answers, all based on a known solving method (A* search). This lets us check both the final answers and the reasoning steps to see how they relate.

Interestingly, we find that even when a model gives the right answer, its reasoning steps can still be wrong or messy. To go further, we even train models using completely random and incorrect reasoning steps — and surprisingly, they still perform about the same, and sometimes even better, than those trained on correct steps.

This suggests that the step-by-step "thoughts" the model shows aren’t as meaningful or reliable as many assume. In short, just because a model looks like it’s reasoning through a problem doesn’t mean it actually is — and we should be careful not to treat its outputs as if it thinks like a human or follows strict logic."

Burpwn – Burp Suite but its for AI agents (it works)

https://github.com/own2pwn-fr/burpwn
1•own2pwn-fr•2m ago•0 comments

Show HN: Discover Wikipedia articles popular on Hacker News

https://www.orangecrumbs.com/
2•octopus143•2m ago•0 comments

Jane Elliott: Brown Eyes, Blue Eyes

https://www.lowellmilkencenter.org/programs/projects/view/brown-eyes-blue-eyes/hero
1•evo_9•2m ago•0 comments

Text Diffusion – Brendan O'Donoghue, Google DeepMind [video]

https://www.youtube.com/watch?v=r305-aQTaU0
2•Topfi•8m ago•0 comments

Feedback on Miz Framework GitHub

3•sajjadws•8m ago•0 comments

A clear fishing wire is tied around the island of Manhattan

https://old.reddit.com/r/Damnthatsinteresting/comments/boea4v/a_clear_fishing_wire_is_tied_around...
4•vinnyglennon•9m ago•0 comments

Anthropic Models in Microsoft Online Services

https://learn.microsoft.com/en-us/microsoft-365/copilot/connect-to-ai-subprocessor
2•sntran•9m ago•0 comments

Please Stay Calm and Listen

https://zhenyi.gibber.blog/please-stay-calm-and-listen
2•zhenyi•11m ago•0 comments

Oracle is changing free tier limits. Update by the 15th to avoid charges

https://old.reddit.com/r/selfhosted/comments/1u4wqnj/psa_oracle_is_changing_free_tier_limits_upda...
2•wrxd•11m ago•0 comments

Real-time tracker of AI-driven job displacement worldwide

https://ailayoffs.live/
2•streamer45•11m ago•0 comments

Meta moves to unwind $2B Manus deal after Beijing's demand

https://techcrunch.com/2026/06/13/meta-reportedly-moves-to-unwind-2b-manus-deal-after-beijings-de...
2•geox•13m ago•0 comments

Double, BigDecimal, or Fixed-Point?

https://blog.frankel.ch/bigdecimal-vs-double/
2•theanonymousone•14m ago•0 comments

RFC 5218: What Makes for a Successful Protocol? (2008)

https://www.rfc-editor.org/info/rfc5218/
2•themaxdavitt•14m ago•0 comments

Git merges can be better

https://brandondong.github.io/blog/git_merges_can_be_better/
2•thunderbong•14m ago•0 comments

The Future of Work Is Getting Out of the Way

https://julienreszka.com/blog/the-future-of-work-is-getting-out-of-the-way/
2•julienreszka•16m ago•0 comments

FFI in Miri at 8000 segfaults per second [video]

https://www.youtube.com/watch?v=9X-ngiKo_Y0
2•nia-e•17m ago•0 comments

Show HN: The Engineer – Drive Claude Code from a GitHub Issue to a Merged PR

https://github.com/FarzamMohammadi/the-engineer
5•m_farzam•21m ago•0 comments

Gemma 4 for Telephony: From Two AI Models to One – Until I Switched to Chinese

https://medium.com/@j.y.weng/gemma-4-for-telephony-i-replaced-two-ai-models-with-one-in-my-voice-...
2•fidotron•23m ago•0 comments

A frontier without an ecosystem is not stable

https://twitter.com/satyanadella/status/2066182223213293753
2•jger15•24m ago•0 comments

Defensible Deep Research from Open-Weight Models

https://thinkwright.ai/two-phase-research
2•oceanwaves•24m ago•0 comments

Show HN: Landmark AI and ML research explained, redrawn, animated

https://research.rudrite.com/
2•mridul_sahu•27m ago•0 comments

Show HN: Kage – Shadow any website to a single binary for offline viewing

https://github.com/tamnd/kage
20•tamnd•31m ago•5 comments

Prop-for-that: CSS reacts, JavaScript just listens

https://prop-for-that.netlify.app/
2•tobr•32m ago•0 comments

PDFs Don't Have One Meaning: Measuring Semantic Drift Across 24,824 Files

https://pqpdf.com/research.php
2•pqpdf•32m ago•0 comments

Flowscape – A Water Logic Tool

https://github.com/xraid/Flowscape
2•xraid•33m ago•0 comments

SillyTavern: LLM Front End for Power Users

https://sillytavern.app/
2•doener•33m ago•0 comments

The Capitoline Wolf

https://thehappytraveler.ca/travel-guide-italy/capitoline-wolf-siena-rome-myths/
2•jruohonen•35m ago•0 comments

The Trouble with Municipal-Level Population Projections

https://homefreesociology.com/2025/12/08/the-trouble-with-municipal-level-population-projections/
3•luu•35m ago•0 comments

Ask HN: What's the best way to advertise your startup without making videos?

2•ynxshiny•37m ago•0 comments

Not slow and not steady – Unsung

https://unsung.aresluna.org/not-slow-and-not-steady/
2•rbanffy•37m ago•0 comments