Claude Code's "extended thinking" is a summary- not authentic thinking

https://patrickmccanna.net/the-text-in-claude-codes-extended-thinking-output-is-not-authentic/

77•0o_MrPatrick_o0•1h ago

Comments

apothegm•1h ago

Slashdotted.

bpodgursky•1h ago

The full thinking logs are also a summary of a thinking process presumably consistent with one necessary to generate the provided answer. Nobody really understands how LLMs think. Thinking logs seem to be accurate, and summary thinking logs seem to be a good summary of the full thinking logs.

If it's useful, it's useful, enjoy. If you aren't comfortable with that, don't use LLMs. You aren't going to get a mathematical proof of your output, just learn to be comfortable with that, or opt out and be a goat farmer.

0o_MrPatrick_o0•59m ago

I want to measure performance drift over time.

Having access to the reasoning text and output would help with performance measurement.

solarkraft•54m ago

Yeah. The output is magic either way, with or without reasoning.

For daily use I actually like the reasoning summary to be brief/quick to scan.

That said, I understand the author’s desire for the real thing. It just feels better to have that access, especially when Anthropic will give it to you, but encrypted.

dragonwriter•15m ago

> The full thinking logs are also a summary of a thinking process presumably consistent with one necessary to generate the provided answer.

No, they aren't a summary. They are the actual decoding of the sequence of tokens emitted during the the “thinking” stage of response generation.

Just as with, say, a human onner monolog in words vs actual speech, they are a product of the same output process as the non-thinking tokens. They aren’t a translation of the internal process that precedes the output mapped into language, either as a full result or a summary.

fieldcny•1h ago

duh.

Computers don’t think they process, those are very different activities.

anuramat•1h ago

no way, the contents of "reasoning_summary" are summarized?

fyi openai does the same; not really surprising or particularly evil

knollimar•20m ago

Not evil but full of hubris

ur-whale•1h ago

When you have no moat, you have to try and find desperate ways to manufacture one.

anuramat•1h ago

wdym?

ur-whale•59m ago

> wdym?

https://en.wikipedia.org/wiki/Economic_moat

anuramat•51m ago

how is summarized CoT a moat, and how is having the top 2 LLMs not a moat?

Closi•43m ago

If you have the full outputs, it might make it easier for competitors to distil the model or reverse engineer the full process.

It may also be that misaligned responses can be in CoT which OpenAI does not want to show to users.

anuramat•29m ago

but "harder to reverse engineer" isn't manufacturing, that's protecting your moat

dragonwriter•

tsunamifury•1h ago

It’s not surprising than the Sota model makers core goal is to get user dependent while denying them increasing amounts of understanding of how it works to form a deeply unhealthy dependency.

Tell me this. If you hired a junior engineer or designer who refused to explain their thinking on their code and how they solved for the spec what would you do?

(That being said the reasoning output is still a summary of the Kvcache)

orangecat•9m ago

* If you hired a junior engineer or designer who refused to explain their thinking on their code*

Any explanation that someone gives of their thinking process is necessarily lossy and likely partially confabulated.

simianwords•59m ago

Wait I think there are 2 levels of summary. Anthropic is definitely not showing its real thinking even with enterprise agreements. For example in Claude.ai the thinking traces are not real and are themselves summaries.

furyofantares•58m ago

> It isn’t the actual thinking that drove the model’s actions in a session- but a summary of the thinking logic. This is like using saving a jpeg as a .bmp and then editing the .bmp and presenting it as a .jpeg. The conversion produces data loss.

You've got that backwards, .bmp is a lossless format and .jpeg is the lossy one.

0o_MrPatrick_o0•54m ago

My bad! 10 points for House Slytherin!

altmanaltman•43m ago

also a typo in the last sentence you're vrs your

0o_MrPatrick_o0•36m ago

I missed my coffee! Ty! Five points to Slytherin.

glaslong•35m ago

Weirdly pleasant, if minor, signal of human authorship

_fat_santa•58m ago

IMHO I've never found the entire reasoning chain that particularly useful for my work. For me having a summary is honestly better from a context management perspective. I understand why they would encrypt it though, because those reasoning chains are VERY useful if you're distilling the model.

stavros•21m ago

The summary doesn't go into the context, it's for human consumption. The CoT itself goes into the context.

StizzurpXDD•48m ago

This is not just Anthropic. Almost all big AI companies, including OpenAI and Google, hide their model's actual reasoning. This is because revealing the raw reasoning exposes exactly how the AI processes information. These companies spend in huge amounts on R&D to develop a thinking process that is superior to their competition. Exposing those thinking mechanics to competitors would completely defeat the purpose of their spending. They simply won't do it. It's like you telling your exact location to someone who is trying to hunt you down.

duskwuff•46m ago

More to the point - if they expose their model's "thinking" inference, competitors can train on that to replicate the results.

StizzurpXDD•24m ago

Exactly. Google won't like it if they spend millions to make Gemini 3.5 Pro's thinking the best in the world, only for Anthropic or OpenAI to copy it by just seeing the thinking process.

_aavaa_•33m ago

Or like providing the world’s information in machine readable format that the AI companies can convert into model weights without getting permission or compensating the rights holders

Sharlin•29m ago

The cynic in me is wondering whether it's more about how revealing how the sausage is made might bring bad publicity.

bigfishrunning

jerf•48m ago

AIUI it's fairly well established that the models can be saying one thing and "really" thinking another anyhow. The ones I recall seeing traced how simple one-digit arithmetic was done in the chat versus the actual activations under the hood. Tracing a real, non-trivial task through that way would be challenging, and I'd expect it is unlikely that the reasoning would say one thing while some utterly unrelated actual thought process is happening below, but I would expect that there might be a lot of places where the text of the reasoning diverges from what is "actually" being done. I'm not sure the full reasoning readout would produce much real insight anyhow.

I suspect that in some decades, as other architectures are found and used, that the inability of an LLM to "think" without also emitting a token will be seen as one of their fundamental limitations.

adi_pradhan•47m ago

Not surprised at this. The questoins for enterprises are + where can you depend on a black box as a service? + what evals and observability do you need to deploy a black box as a service confidently? + what's the ROI (considering a total footprint of people, token spend, infrastructure, service, ops etc.)

The LLM providers will clearly evolve to be more and more opaque as their services get more capable. The frontier models may even be provided as purely internal advisor or async only so they can monitor your CoT and final answers for cyber etc.

HarHarVeryFunny•46m ago

This is nothing new - these companies don't want their model's output to be useful for distillation/training, so they just give a "summary" of its thinking steps rather than the actual sequence.

RL (the basis of LLM "thinking") is a pretty crude way to achieve the appearance of reasoning given that it reinforces all the steps, including missteps, that got it to a reward. Providing a summary could be seen as form of sane-washing, making the model look more purposeful and directed than it really is!

craigmart•45m ago

This is something we have known for a very long time, and companies are not trying to hide that either. They do it to avoid letting competitors train their models on the CoTs

stingraycharles•35m ago

Yes hasn’t this been around since Opus 4.6? I very much recall this change happening around January or February, and it was very explicitly to prevent distillation. Sonnet does not have this limitation.

Fun fact: if you go back to the old school from 2 years ago and provide explicit CoT prompts, you get the full thinking prompts back again!

So you disable thinking altogether, and instead make thinking part of the regular prompt by prompting it:

“Before providing your answer, think step by step. For example:

The use is asking me to… I need to think about the blah blah. First, I should foo the bar, and then blah blah.

Answer: <put your final answer here>”

And tada.wav we have CoT as it worked in the GPT3 era back again.

0o_MrPatrick_o0•29m ago

Awesome share! Thank you!

KellyCriterion•13m ago

- tada.wav -

Still, one of the daily most played WAV files worldwide, Id guess? :-D

dcrazy•8m ago

I thought this was considered best practice? I actually prefer it to exposed thought channel, much like how I would prefer a human answer with supporting logic instead of an explanation of their problem-solving approach.

philipwhiuk•43m ago

To be honest I thought the 'thinking' was the model being asked 'how did you come up with that' and then it generating a plausible explanation. I know at one point this was correct.

Humans somewhat do the same - something that's been demonstrated in split-brain experiments.

devmor•37m ago

That's not really how LLMs work at all. I would really recommend checking out something like [1] to get a rough understanding and avoid attributing too much to them.

1. https://medium.com/@eshvargb/the-llm-journey-how-neural-netw...

stingraycharles•33m ago

No not at all, you got it backwards. This was originally called “chain of thought prompting”, and it basically explained a model on how to reason through a problem before providing an answer.

Because of the nature of how LLMs work — text prediction engines - by putting the explicit reasoning steps first, it improves the likelihood of the final answer (which then is being predicted based on the entire reasoning chain as input) being correct.

InsideOutSanta•27m ago

If you ask an LLM afterward how it arrived at an answer, it might produce a plausible but incorrect explanation. But that's not what the thinking stream is; that's actually part of how it generates the answer.

reliablereason•37m ago

Is the thinking even done in real tokens? I thought it was done using the pure residual stream. That is instead of collapsing the residual stream to a token you treat the final layers output as a vector of size d_model and use that as input for the next position in the transformer.

If that is the case thinking is not visible to us as users due to it not being done in text.

giancarlostoro•34m ago

Claude does all its thinking in text, its ChatGPT which does not do its reasoning in text. I believe its sort of implied / understood (?) that this is part of Claude's secret sauce over OpenAI. OpenAI will use less tokens, but Claude will be more correct, more of the time.

wqaatwt•30m ago

All open model that have reasoning seem to be doing it in text tokens. Is there any indication that closed models are approaching this somehow fundamentally differently?

throwuxiytayq•25m ago

That would be a huge deal, meaning we've lost even our shitty, ineffective ways of monitoring agent reasoning stream. Big setback when it comes to alignment and interpretability.

I don't know about Claude, but latest GPT versions still have a readable reasoning stream. It sometimes leaks out when the model gets confused, e.g., during a tool call. If you're curious, looks simplified; less words; extremely compact. They optimize tokens. But remain readable.

wqaatwt•32m ago

Is this some new revelation? That was well known when the first OpenAI/Anthropic “thinking” models came out.

InsideOutSanta•29m ago

It's not a new revelation, but clearly a lot of people aren't aware of it, so talking about it is still valuable.

irthomasthomas•24m ago

I won't use or recommend models with hidden reasoning, (thats all American models). It's too much of a risk and makes prompt optimization harder. Risky because it makes it possible for an attacker to prompt inject the reasoning chain to carry out a secret objective, and to hide that from the summaries and output.

Interleaved reasoning and function calling makes this even more dangerous. A model can call functions during the hidden reasoning phase. An attacker could then exfiltrate data from you while the reasoning summary hides it from the user.

It also makes it impossible to know if the model is doomplooping during reasoning and burning tokens for no reason, as gemini is want to do, which we know about because its hidden reasoning often leaks out when it doomloops.

When the models are AGI and secure from prompt injection I may stop caring, until then I want to know exactly what the model responds to my prompts. or exactly what the agent is doing on my behalf.

Roritharr•2m ago

I've thought about the high-jacking of reasoning-chains as a potential vector, but never saw a proven implementation in american models since, from my understanding, all major vendors throw out the reasoning tokens between turns.

root_axis•22m ago

Research shows that even the raw trace tokens do not actually reflect underlying model "thoughts".

josefritzishere•16m ago

AI does not think. It is a word guessing machine. Anthropomorphizing technology does not add anything to our understanding.

runeblaze•4m ago

tbh the summarized thinking with encrypted raw thinking is there for many purposes; it is there to:

1. make distillation much harder

2. safety: prevent modifications to the thinking leading to injection attacks.

3. also honestly sometimes the model raw thoughts can be deranged and is not a good user experience (consider the varied audience in the market, etc.)

also often the mass underestimate/the model makers over-estimate how people love distilling models

Stad Ship Tunnel

Taking Stock of the Seed Stage

Plan9 distribution with AI agent (pi9) and tiny winXP inspired UI

The Parent Uprising Against Screen Time at School

I turned the MacBook notch into a command line for your thoughts

A Theory of Why Prompt Injection Works

I built a voice AI chess companion for elderly players, in 3 languages

Show HN: Ratchet – safe SPI flash writes (polling, erase-verify) in Rust

How the Peter Thiel-Linked Dialog Club Ranks Its Members

Solo founding is at an all-time high: Top performers have these traits in common

C.C. Filson: The Man, the Coat, and the Company He Left Behind

Township Leaders Vow to Fight Nuclear AI Data Center

Rotary Mouse: Scroll faster with full control

Show HN: Otto – drive real browser tabs over a relay instead of a headless farm

Show HN: OpenLanguage, open source conversational language tutor for iOS

Most players never finish their games (median 36% completion)

SSlow Roads – a chill drive through an endless, generated world [video]

A running list of reasons to move to open source

The economics of a one-person AI business

Steam accounted for 20% of Capcom's revenue, double PlayStation's share

Show HN: The Room – a novel about life inside the C++ standards committee

Ask HN: Agents, comments, and harnesses – oh my

Show HN: Electric Sheep – A News Reader

World Cup 2026 qualification calculator from FIFA Docs

Git is forever. I'm building Oak anyways

Tools Are Harness Too

World Models and Interpretability Are Two Sides of the Same Coin

Altermagnets can turn neighbouring materials altermagnetic, too

Show HN: FastUbu – An Ultrafast Video Archive

Show HN: Quake in the browser, with procedurally generated levels