frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

The Unreasonable Effectiveness of Reasonless Intermediate Tokens

https://arxiv.org/abs/2505.13775
4•YeGoblynQueenne•1y ago

Comments

tocs3•1y ago
I asked ChatGPT to restate this in more laymen's terms (posted below) and I am not to surprised at the answer.

"Lately, some AI models have shown impressive abilities to solve complex problems, and many people credit this to a method called Chain of Thought (CoT), where the model is trained to think through steps like a human might. In this paper, we take a closer look at that idea to see if it's really what's driving better performance.

We focus on the model’s step-by-step thinking (the words it generates along the way) — often treated like human "thoughts" — and examine whether these actually help the model solve problems more accurately. To test this, we train AI models using clean, correct step-by-step reasoning paths and final answers, all based on a known solving method (A* search). This lets us check both the final answers and the reasoning steps to see how they relate.

Interestingly, we find that even when a model gives the right answer, its reasoning steps can still be wrong or messy. To go further, we even train models using completely random and incorrect reasoning steps — and surprisingly, they still perform about the same, and sometimes even better, than those trained on correct steps.

This suggests that the step-by-step "thoughts" the model shows aren’t as meaningful or reliable as many assume. In short, just because a model looks like it’s reasoning through a problem doesn’t mean it actually is — and we should be careful not to treat its outputs as if it thinks like a human or follows strict logic."

Evaluation order and nontermination in query languages

https://www.rntz.net/post/2026-06-11-datalog-nontermination.html
1•luu•1m ago•0 comments

Show HN: Noter – AI agent dashboard for monitoring coding harnesses locally

https://noterai.tech
1•carlobizzaro•2m ago•0 comments

Show HN: Deterministic Simulation Testing

https://workers.io/blog/deterministic-simulation-testing/
1•chaitanyya•4m ago•0 comments

The Download: Anthropic Launches Claude Science, and California's Carbon Manure

https://www.technologyreview.com/2026/07/01/1139996/the-download-anthropic-claude-science-califor...
1•joozio•7m ago•0 comments

Show HN: dies in 7 hours

https://jonno.nz/posts/your-show-hn-dies-in-7-hours/
2•jonnonz•8m ago•0 comments

A Machine-Verified Proof of a Quantum-Optimization Conjecture

https://arxiv.org/abs/2606.29687
1•ilaysat•11m ago•0 comments

AgentOS

https://agentos-sdk.dev/
1•handfuloflight•11m ago•1 comments

I think it's still important to understand the code that our agents write

https://twitter.com/geoffreylitt/status/2072522251300409556
3•tosh•13m ago•0 comments

Let's Go Kill the Internet

https://nymag.com/intelligencer/article/doublespeed-tech-founder-creating-an-army-of-ai-influence...
2•Michelangelo11•14m ago•0 comments

MarketFish – Simulate a market with 128 AI consumers before you launch

https://github.com/Key-wxh/market-fish
3•a280887763•16m ago•0 comments

Text AI watermarks will always be trivial to remove

https://www.seangoedecke.com/text-ai-watermarks/
2•ingve•21m ago•0 comments

Show HN: Scalable AI Management Platform

https://github.com/metadist/synaplan/
1•metaralf•25m ago•0 comments

The gauge broke: devs felt 20% faster with AI, measured 19% slower

https://intrepidkarthi.com/writing/the-gauge-broke/
16•intrepidkarthi•26m ago•3 comments

BioShocking AI: "Gaming" the AI Browser and Escaping Its Guardrails

https://layerxsecurity.com/blog/bioshocking-ai-gaming-the-ai-browser-and-escaping-its-guardrails/
1•croes•26m ago•0 comments

Horsewood (2 July 2026) We Tried It My Honest ReviewS

https://finance.yahoo.com/sectors/healthcare/articles/horsewood-urgent-report-2026-horse-19110038...
3•Gafyhanu•29m ago•0 comments

Seattle Just Had an Earthquake

3•tobinfekkes•33m ago•2 comments

Feds Might Flip the Script on Right to Repair Vehicle Emissions Systems

https://www.thedrive.com/news/feds-might-flip-the-script-on-right-to-repair-vehicle-emissions-sys...
2•josephcsible•41m ago•0 comments

Likelihood, and Maximum Likelihood, in Statistics

https://bactra.org/notebooks/likelihood.html
2•Tomte•43m ago•0 comments

Fable 5 is insanely good

4•vuphanse•43m ago•0 comments

Ask HN: Who's Hiring Remote Contractors? (July 2026)

2•akashwadhwani35•44m ago•1 comments

Typst: Designing for Incrementality (Laurenz Mädje at RustWeek) [video]

https://www.youtube.com/watch?v=yWWVhbyOWWE
3•felixhummel•44m ago•0 comments

Rasa Intelligence: AI diagnostic engine-gives one business verdict in 90 seconds

https://tech-rasa.com
2•Deepti251•47m ago•0 comments

My Story of 3D Realms / Apogee Part I (2020)

https://joesiegler.blog/2020/11/my-story-of-apogee-3dr/
1•Michelangelo11•48m ago•0 comments

NoUI()

https://www.swiftjectivec.com/noui/
3•ingve•51m ago•0 comments

Build Cost Analysis: Sydney App Developer vs. Melbourne vs. Offshore

https://www.wallstreetoasis.com/forum/venture-capital/build-cost-analysis-sydney-app-developer-vs...
1•ImperoITService•53m ago•0 comments

OpenAgents makes Sonnet 5, Fable 5 and other agents collaborate in one thread

https://openagents.org/workspace
2•gshg12•55m ago•1 comments

The Socialist Wave Reaches the Heartland

https://www.wsj.com/opinion/colorado-democrats-socialists-melat-kiros-michael-bennet-99ad5a66
2•doener•55m ago•2 comments

PyCanopy: A spatial query layer for Polars, competitive with DuckDB, SedonaDB

https://github.com/pranav-walimbe/PyCanopy
2•pranav1077•56m ago•1 comments

The Complete Homemade Juggling Beanbag Guide

https://www.joshuaclifton.com/juggle/
1•mrauha•56m ago•1 comments

Show HN: LinkedIn Focus Chrome Extension

https://yvetter438.github.io/LinkedInFeedBlockerWebsite/
1•ywv•58m ago•0 comments