frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

The Unreasonable Effectiveness of Reasonless Intermediate Tokens

https://arxiv.org/abs/2505.13775
4•YeGoblynQueenne•1y ago

Comments

tocs3•1y ago
I asked ChatGPT to restate this in more laymen's terms (posted below) and I am not to surprised at the answer.

"Lately, some AI models have shown impressive abilities to solve complex problems, and many people credit this to a method called Chain of Thought (CoT), where the model is trained to think through steps like a human might. In this paper, we take a closer look at that idea to see if it's really what's driving better performance.

We focus on the model’s step-by-step thinking (the words it generates along the way) — often treated like human "thoughts" — and examine whether these actually help the model solve problems more accurately. To test this, we train AI models using clean, correct step-by-step reasoning paths and final answers, all based on a known solving method (A* search). This lets us check both the final answers and the reasoning steps to see how they relate.

Interestingly, we find that even when a model gives the right answer, its reasoning steps can still be wrong or messy. To go further, we even train models using completely random and incorrect reasoning steps — and surprisingly, they still perform about the same, and sometimes even better, than those trained on correct steps.

This suggests that the step-by-step "thoughts" the model shows aren’t as meaningful or reliable as many assume. In short, just because a model looks like it’s reasoning through a problem doesn’t mean it actually is — and we should be careful not to treat its outputs as if it thinks like a human or follows strict logic."

Japanese Calbee rolls out monochrome snack packaging amid ink supply worries

https://english.kyodonews.net/articles/-/77153
1•giuliomagnifico•56s ago•0 comments

LlamaStash – Zero-overhead, terminal-native llama.cpp launcher

https://github.com/llamastash/llamastash
1•deepu105•1m ago•0 comments

El Niño to bring more heat and disasters in coming months, UN warns

https://www.politico.eu/article/el-nino-heat-disasters-un/
2•giuliomagnifico•4m ago•0 comments

A simple templating library for LLM prompts

https://promptsrus.io
1•sammy0910•4m ago•0 comments

Trying to build a community that does not become another dead Slack

https://www.indiehackers.com/post/trying-to-build-a-community-that-does-not-become-another-dead-s...
1•Itay_Forer•5m ago•0 comments

Show HN: Krimto – Your AI's memory as markdown in your own git

https://github.com/krimto-labs/krimto
1•paulbuiko•6m ago•0 comments

Show HN: Scan2Sheet – Receipt Scanner to Google Sheets for easy expense tracking

https://www.scan2sheet.com
1•TonyVu85•6m ago•1 comments

The F-Test: Detecting A/B Test Interactions and Conditional Treatment Effects

https://www.conductrics.com/ab-test-interactions-partial-f-test/
2•conductrics•7m ago•1 comments

The Rippling Impact of Grants to Woodworkers (Part 2)

https://christopherschwarz.substack.com/p/the-rippling-impact-of-grants-part
1•crescit_eundo•8m ago•0 comments

APM and Distributed Tracing in agentic era

https://engineering.theblueground.com/a-software-engineers-guide-to-observability-part-2-apm-dist...
1•andrikoz•8m ago•1 comments

Medical Waste Disposal

1•rehmanasghar•8m ago•0 comments

Mornings and nights no longer exist at 47C: A day in the hottest place in India

https://www.bbc.co.uk/news/articles/crmp0krp98ro
2•mellosouls•9m ago•0 comments

Ditto · a Nostr to Bitcoin Wallet

https://gitlab.com/soapbox-pub/ditto/-/blob/main/WALLET.md
2•janandonly•9m ago•0 comments

An AI agent ported our codebase from Python to Rust

https://aboutcode.org/blog/agentic-scancode-port-case-study/
2•Tiberium•10m ago•0 comments

LLM, give me a JSON. Make no mistakes

https://nobodywho.ooo/posts/llm-give-me-a-json/
3•marek-hradil•13m ago•0 comments

The Minimum Viable Unit of Saleable Software

https://brandur.org/minimum-viable-unit
2•surprisetalk•14m ago•0 comments

They Write the Right Stuff (1996)

https://web.archive.org/web/20031003042205/http://www.fastcompany.com/magazine/06/writestuff.html
2•rzk•14m ago•0 comments

New RFC 9989 for DMARC and updated guidance for postmasters

https://www.dmarctrust.com/blog/dmarcbis-rfc-9989-9990-9991
2•madflo•15m ago•0 comments

Second – Scalable Bitcoin Payments with Ark and Lightning

https://second.tech/
2•janandonly•16m ago•0 comments

Malaysia bans social media accounts for children under 16, but questions remain

https://www.latimes.com/world-nation/story/2026-06-01/malaysia-bans-social-media-accounts-for-chi...
2•1vuio0pswjnm7•19m ago•2 comments

'People are getting hurt': OpenAI sued by Florida over alleged safety risks

https://www.latimes.com/business/story/2026-06-02/people-are-getting-hurt-florida-suing-openai-am...
3•1vuio0pswjnm7•20m ago•1 comments

Nuclear Space Power and Propulsion [video]

https://www.youtube.com/watch?v=QxeRPQGQDk4
2•skibz•21m ago•0 comments

We've raised $10M to build open source AI security

https://archestra.ai/blog/archestra-announces-10m-seed
8•ildari•21m ago•1 comments

Meta reveals details about layoffs in Playa Vista and Menlo Park

https://www.latimes.com/business/story/2026-06-02/meta-provides-details-about-layoffs-in-playa-vi...
3•1vuio0pswjnm7•21m ago•0 comments

Dumb core, smart edge for AI agents

https://arizenai.com/dumb-core-smart-edge/
2•arizen•22m ago•0 comments

Show HN: Cable Detective Mac App – What's Plugged In

https://cable-detective.franzai.com/
2•franze•23m ago•0 comments

Show HN: Jørnal, a journaling app where the page is always blank

https://jornal.ink
2•tskj•24m ago•0 comments

Analogue 3D Has a New Competitor (Modretro M64 Review) [video]

https://www.youtube.com/watch?v=RMROAr8VOxg
2•skibz•24m ago•0 comments

RHSB-2026-006 Supply chain compromise of RedHat-cloud-services NPM packages

https://access.redhat.com/security/vulnerabilities/RHSB-2026-006
2•dralley•24m ago•0 comments

Show HN: Humm – Share the Vibe, an app for quantifying the ambience

https://play.google.com/store/apps/details?id=com.abj.humm&hl=en_US
2•abj1729•25m ago•1 comments