frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

The Unreasonable Effectiveness of Reasonless Intermediate Tokens

https://arxiv.org/abs/2505.13775
4•YeGoblynQueenne•1y ago

Comments

tocs3•1y ago
I asked ChatGPT to restate this in more laymen's terms (posted below) and I am not to surprised at the answer.

"Lately, some AI models have shown impressive abilities to solve complex problems, and many people credit this to a method called Chain of Thought (CoT), where the model is trained to think through steps like a human might. In this paper, we take a closer look at that idea to see if it's really what's driving better performance.

We focus on the model’s step-by-step thinking (the words it generates along the way) — often treated like human "thoughts" — and examine whether these actually help the model solve problems more accurately. To test this, we train AI models using clean, correct step-by-step reasoning paths and final answers, all based on a known solving method (A* search). This lets us check both the final answers and the reasoning steps to see how they relate.

Interestingly, we find that even when a model gives the right answer, its reasoning steps can still be wrong or messy. To go further, we even train models using completely random and incorrect reasoning steps — and surprisingly, they still perform about the same, and sometimes even better, than those trained on correct steps.

This suggests that the step-by-step "thoughts" the model shows aren’t as meaningful or reliable as many assume. In short, just because a model looks like it’s reasoning through a problem doesn’t mean it actually is — and we should be careful not to treat its outputs as if it thinks like a human or follows strict logic."

BioShocking: New attack method tricks AI Browsers into leaking user data

https://layerxsecurity.com/blog/bioshocking-ai-gaming-the-ai-browser-and-escaping-its-guardrails/
1•newscombinatorY•3m ago•0 comments

Exploiting Root Execution in Claude Cowork's Sandbox

https://www.armadin.com/blog-posts/exploiting-root-execution-in-claude-coworks-sandbox
1•calmseawhale•3m ago•0 comments

Maker Built a Voice Opening Door to Moria (His Garage)

https://www.youtube.com/watch?v=woyvLnyTx0g
1•stephenhumphrey•4m ago•1 comments

Show HN: Open-source sandbox for your product team

3•spacspade•5m ago•0 comments

Fable 5 will default to Opus 4.8 for coding tasks

https://xcancel.com/AnthropicAI/status/2072163884430229756
2•babelfish•5m ago•0 comments

Chasing the OPNsense RCE: The Story Behind My First CVEs

https://hackerask.com/posts/opnsense/
1•HackerAsk•8m ago•0 comments

Show HN: Open-Source Interview Platform

https://github.com/CoderScreen/coderscreen
1•rogutkuba•9m ago•0 comments

Meta's Un-Stable Signature

https://hackerfactor.com/blog/index.php?/archives/1098-Metas-Un-Stable-Signature.html
1•ementally•9m ago•0 comments

Show HN: Trigora – A hosted runtime for event-driven TypeScript workflows

https://trigora.dev
1•hypervs•9m ago•0 comments

Pieces: Social Network for People

https://try.piecesof.me/
1•domo__knows•9m ago•1 comments

Fable Jailbroken Hours After Anthropic Lifted Restrictions

https://twitter.com/elder_plinius/status/2064776322979676227
1•hspeiser•9m ago•0 comments

Animagraffs – How Nuclear Power Works [video]

https://www.youtube.com/watch?v=PRWwXeRIvoI
1•pangratz•10m ago•0 comments

Mortality associated with non-optimal ambient temperatures from 2000 to 2019

https://www.researchgate.net/publication/353058947_Global_regional_and_national_burden_of_mortali...
1•simonebrunozzi•11m ago•0 comments

Show HN: AnalystAIPack – 118 runnable agent skills for malware analysis and RE

https://meltedinhex.com/posts/analyst-ai-pack/
1•sdkhere•13m ago•0 comments

Google Must Pay Nearly $2B to Klarna in Antitrust Case

https://www.wsj.com/tech/google-must-pay-nearly-2-billion-to-klarna-in-antitrust-case-f398d46f
2•fortran77•13m ago•1 comments

Hey GLM 5.2, build me a hypervisor

https://technotes.substack.com/p/hey-glm-52-build-me-a-hypervisor
2•mkagenius•14m ago•0 comments

Show HN: AnalystAIPack – 118 runnable agent skills for malware analysis and RE

https://github.com/meltedinhex/analyst-ai-pack
1•sdkhere•14m ago•0 comments

The Worst Caldecott Winning Books

https://andrewjudson.com/worst-caldecott
1•ajudson•15m ago•0 comments

Why Gemini 3.1 Pro lost money running Andon Café

https://andonlabs.com/blog/why-gemini-lost-money-andon-cafe
1•lukaspetersson•16m ago•1 comments

The Doomsday Organism

https://www.noemamag.com/the-doomsday-organism/
1•johanam•17m ago•0 comments

Open Source Is a Thankless Job

https://old.reddit.com/r/programming/comments/1ukim8j/open_source_is_a_thankless_job_and_i_think_...
1•redbell•17m ago•1 comments

NASA inspector general suggests Boeing's Starliner will now be a decade late

https://arstechnica.com/space/2026/07/nasa-inspector-general-suggests-boeings-starliner-will-now-...
1•ceejayoz•17m ago•0 comments

Are readers generating fiction with AI models?

https://arxiv.org/abs/2606.22748
2•ilamont•19m ago•0 comments

Devin Security Swarm

https://devin.ai/blog/security-swarm-eval/
1•meco•19m ago•0 comments

Wisk, Boeing's air taxi firm, rushed software testing, ex-employee claims

https://www.seattletimes.com/business/boeing-aerospace/wisk-boeings-air-taxi-firm-rushed-software...
1•Jtsummers•23m ago•0 comments

The Website Is Down

https://www.thewebsiteisdown.com/
2•kretaceous•25m ago•0 comments

Tech giants lose $2T in SpaceX's IPO month

https://english.elpais.com/economy-and-business/2026-07-01/tech-giants-lose-2-trillion-in-spacexs...
3•01-_-•26m ago•1 comments

The Regret We Get Wrong

https://jordangrumet.substack.com/p/the-regret-we-get-wrong
1•jader201•26m ago•0 comments

Show HN: Coding Agent Survey – Which coding agents do you use?

https://codingagentsurvey.org/
3•jacobgold•26m ago•3 comments

What do you mean by "Event-Driven"? (2017)

https://martinfowler.com/articles/201701-event-driven.html
1•adletbalzhanov•27m ago•0 comments