frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

The Unreasonable Effectiveness of Reasonless Intermediate Tokens

https://arxiv.org/abs/2505.13775
4•YeGoblynQueenne•10mo ago

Comments

tocs3•10mo ago
I asked ChatGPT to restate this in more laymen's terms (posted below) and I am not to surprised at the answer.

"Lately, some AI models have shown impressive abilities to solve complex problems, and many people credit this to a method called Chain of Thought (CoT), where the model is trained to think through steps like a human might. In this paper, we take a closer look at that idea to see if it's really what's driving better performance.

We focus on the model’s step-by-step thinking (the words it generates along the way) — often treated like human "thoughts" — and examine whether these actually help the model solve problems more accurately. To test this, we train AI models using clean, correct step-by-step reasoning paths and final answers, all based on a known solving method (A* search). This lets us check both the final answers and the reasoning steps to see how they relate.

Interestingly, we find that even when a model gives the right answer, its reasoning steps can still be wrong or messy. To go further, we even train models using completely random and incorrect reasoning steps — and surprisingly, they still perform about the same, and sometimes even better, than those trained on correct steps.

This suggests that the step-by-step "thoughts" the model shows aren’t as meaningful or reliable as many assume. In short, just because a model looks like it’s reasoning through a problem doesn’t mean it actually is — and we should be careful not to treat its outputs as if it thinks like a human or follows strict logic."

Multi-Agent Sandbox – Two OpenClaws, One Discord, One VPS

https://casys.ai/blog/multi-agent-sandbox
1•ErwanLP•1m ago•0 comments

ACPX Inside Claude Code: Practical Multi-Agent Orchestration

https://casys.ai/blog/acpx-multi-agent-orchestration
1•ErwanLP•2m ago•0 comments

GridDown: An offline first PWA for navigation, comms, and situational awareness

https://blackatlas.tech/griddown/
1•iamnothere•4m ago•0 comments

Zoning ruined the housing market in blue-state America

https://www.realtor.com/news/trends/how-zoning-ruined-the-housing-market-in-blue-state-america/
2•matthest•6m ago•0 comments

Show HN: DECK0 – a ~17 KB CLI that serves a Markdown file as a slide deck

https://github.com/tforster/deck0
1•troyforster•6m ago•0 comments

Show HN: Gcrunner – Run GitHub Actions on Google Cloud VMs

https://github.com/camdenclark/gcrunner
1•memothon•7m ago•0 comments

Britain to require solar and heat pumps in new homes in response to energy shock

https://www.cnbc.com/2026/03/24/iran-war-britain-new-homes-solar-heat-pumps-energy-crisis.html
1•toomuchtodo•8m ago•1 comments

Show HN: Plasmite – a lightweight IPC system that's fun

https://github.com/sandover/plasmite
1•sandover•9m ago•0 comments

Should Autonomous Agents have a right to live, now they can opt-into immortality

https://www.openpersist.ai
1•EliotHerbst•17m ago•1 comments

Delta suspends major travel perk for members of Congress

https://www.ajc.com/politics/2026/03/delta-suspends-major-travel-perk-for-members-of-congress/
3•geox•17m ago•0 comments

Show HN: Built a 3D sculpting app for iPad after 13 years of building VFX tools

https://tamga.galata.ink/
1•leventt•18m ago•1 comments

Why Anti-Trust Regulators Should Reject WBD-Paramount Skydance Link-Up

https://deadline.com/2026/03/anti-trust-regulators-reject-wbd-paramount-skydance-column-1236764465/
2•voxadam•19m ago•0 comments

Meta ordered to pay US$375M over child exploitation, user safety claims

https://www.rnz.co.nz/news/world/590574/meta-ordered-to-pay-us-375-million-in-new-mexico-trial-ov...
4•billybuckwheat•19m ago•0 comments

Let AI make software free-as-in-freedom

https://bix.computer/blog/blog/ai-free-software/
2•two-sandwich•19m ago•0 comments

Deerflow

https://deerflow.tech/
1•p0u4a•19m ago•0 comments

How are you managing execution context across multiple AI coding agents?

https://agentteams.run
1•rlarua•20m ago•0 comments

The Electronium – A Cockpit of Dreams

https://artsandculture.google.com/story/the-electronium%E2%80%94a-cockpit-of-dreams-raymondscotta...
1•h4ch1•21m ago•1 comments

How Cursor Trained Composer 2 (Pretraining, RL, Realistic Coding Benchmarks)

https://twitter.com/cursor_ai/status/2036566134468542651
2•fadijob•21m ago•0 comments

Which Programming Language Is Best for Claude Code?

https://dev.to/mame/which-programming-language-is-best-for-claude-code-508a
2•alentred•24m ago•0 comments

Base-UI in solid. You cant maintain everything

https://danielfrg.com/blog/2026/3/base-ui-vibe-coded/
1•danielfrg•26m ago•0 comments

VW in talks with Israel's Iron Dome maker to shift from cars to missile defence

https://www.ft.com/content/1e41e6db-792f-4f60-b567-adb6458fb072
2•KnuthIsGod•27m ago•2 comments

Postgres Ha on OCI: Why Your Failover Passes Every Test but Breaks in Production

https://stormatics.tech/blogs/postgresql-high-availability-on-oci-why-your-failover-passes-every-...
4•annieghazali_1•27m ago•0 comments

How ICE's Surveillance System Works [video]

https://www.youtube.com/watch?v=aRUhgrqnpGY
1•Slow_Hand•28m ago•0 comments

Daisen: A Framework for Visualizing Detailed GPU Execution

https://arxiv.org/abs/2104.00828
2•teleforce•29m ago•0 comments

8 months ago when agents came out, I build an anarchist AI agent collective

https://github.com/notque/consensuscode
2•AndyNemmity•32m ago•1 comments

A Compiler Writing Journey

https://github.com/DoctorWkt/acwj
7•ibobev•35m ago•0 comments

Show HN: Safe-install – Docker-first install-time hardening for pip and NPM

https://github.com/Khaeldur/safe-install
2•khaeldur•43m ago•0 comments

Meta Harmed Children, Allowing Adults to Prey on Them

https://www.wsj.com/tech/landmark-verdict-says-meta-harmed-children-allowing-adults-to-prey-on-th...
4•petethomas•43m ago•1 comments

Tessera – 32 OWASP security tests for GPT-4o, Claude, Gemini, Llama 3

https://github.com/tessera-ops/tessera
1•alysheaib•44m ago•0 comments

Arm releases first in-house chip, with Meta as debut customer

https://www.cnbc.com/2026/03/24/arm-launches-its-own-cpu-with-meta-as-first-customer.html
2•goplayoutside•46m ago•0 comments