frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Ask HN: How are thinking efforts implemented?

18•simianwords•12h ago
Claude and ChatGPT have thinking efforts where you can tune the amount of thinking allowed.

Like low, medium, high, xhigh and so on.

But are they different models underneath? Or same model with different parameter?

The reason I ask is because, if I change the effort param mid conversation in Claude code, I get a warning suggesting I’m breaking the cache.

I don’t think this happens in Codex because when I change the effort, the responses are still quick.

Comments

__patchbit__•11h ago
At a guess. May be associated with token length context window. Down selecting is consistent with warning message, forcing cutoff to context window. The technical term cache being a synonym. Increasing the headroom for more "thinking" should allow the implementation to access more resources without warning about the cache breaking.
aabdi•10h ago
Different models do slight variants.

Usually it’s done in post training to enforce behavior based on prompt. Ie. System prompt with thinking:max or low or wtv.

Enforcement then goes via constrained decoding, checking for think token start and end with max lengths, or other variations

pyentropy•9h ago
Take a look at the harmony repo which specifies the internal OpenAI format - the effort level is specified in the context after the <|start|> tag - https://github.com/openai/harmony

Note that inference libs also have parsers that put hard limits on reasoning tokens with separate counters (similar to how you can put a limit on token generation per completion versus waiting for an <eos>). For that, take a look at vllm reasoning docs.

pyentropy•3h ago
Examples with inference of different reasoning effort levels is in the OpenAI docs as well - https://developers.openai.com/cookbook/articles/openai-harmo...

https://docs.vllm.ai/en/latest/features/reasoning_outputs/#a...

https://developers.openai.com/api/docs/guides/reasoning

simianwords•3h ago
I think you have the right answer but I'm struggling to understand: does changing the effort change the prompt at the start of the conversation? I wonder why come up with this way at all? Why not just add a parameter at the end or something? At least it won't break cache.

Maybe like: add a secret suffix to your chat in the conversation to think more like

   conversation....

   Hey please help
   [think more]
pyentropy•3h ago
I'm considering the possibility that it's good to break the prefix and cache because the LLM itself was rewarded (during post-training) with different prefixes/system prompts, each containing reasoning traces of the correct size.

I might be very very wrong though and LLMs disagree with me, insisting that cache is preserved and the system message doesn't have to change (even though it often contains effort level in context) if effort level changes across turns, and that all you have to do is tell the inference lib that parses think tags to early-close think tags that are too long.

sometimelurker•8h ago
they use multitoken prediction behind the scenes, that might interact with the CoT in a strange way. maybe for different thinking modes they have different MTP models? if so thats interesting
pyentropy•8h ago
The number of tokens you predict at time (multi or not) has nothing to do with whether the model wants to emit any, some or a lot of reasoning tokens in reasoning tag -- similar to how branch prediction will not really change the for loop iteration count.
bjourne•6h ago
LLMs work by generating the most likely continuation to a prompt. But they can also generate multiple likely continuations. This create multiple branches which in turn can generate even more branches. The LLM can then evaluate the branches, prune the unpromising ones, and merge the best ones. More branches means more tokens, means more effort.
simianwords•5h ago
this has nothing to do with the thinking effort however
bjourne•4h ago
Yes, it does. Breadth of search is exactly what the effort setting controls.
pyentropy•3h ago
LLM-judge/parallel branching ≠ multi-token prediction ≠ reasoning effort.

See https://developers.openai.com/cookbook/articles/openai-harmo... and src/openai/types/shared/reasoning_effort.py

Ask HN: What was your "oh shit" moment with GenAI?

684•andrehacker•3d ago•1067 comments

Ask HN: Why is the HN crowd so anti-AI?

440•Ekami•1d ago•734 comments

Ask HN: How are thinking efforts implemented?

18•simianwords•12h ago•12 comments

Ask HN: Job market for SDMs/Engineering Managers. Any reliable data?

4•ed_balls•4h ago•1 comments

Ask HN: I made an image watermarking tool. What are the issues open-sourcing it?

3•minimaxir•5h ago•2 comments

Tell HN: Stripe ToS update demands biometrics, freezes payments until given

11•cuz-reasons•6h ago•1 comments

Ask HN: So what happened to Facebook "localhost" tracking?

105•juliusceasar•3d ago•102 comments

Tell HN: Helium is the best browser I ever used

4•prmph•7h ago•5 comments

Ask HN: What is your (AI) dev tech stack / workflow?

158•dv35z•2d ago•130 comments

Ask HN: How do you find deep technical content?

37•f311a•3d ago•28 comments

Ask HN: How to get my contact info off US political party's list

9•kaycebasques•1d ago•3 comments

Ask HN: Where do you get the latest updates about AI?

5•d0able•23h ago•2 comments

Ask HN: Gin rummy strategies

24•bix6•3d ago•4 comments

Bad MCP design costs your agent 5x more tokens

15•JohnnyZhang483•2d ago•1 comments

Ask HN: Is the web for machines (/llm.txt) the one we wished we had as humans?

36•sunshine-o•2d ago•57 comments

New Biochemistry-Based Metabolic Protocol Seeking Alpha Concierge Members

3•joshwprinceton•22h ago•1 comments

Does anyone know since when we are close to building in space?

6•kingleopold•1d ago•2 comments

Ask HN: My competitors have flawed products but I can't get traction

11•saveitincork•2d ago•17 comments

Ask HN: Does robotics capabilities research accelerate AGI timelines?

8•themasterchief•1d ago•1 comments

I'm tired of LLM skill slop, so I built mine with regression tests

7•iliaov•3d ago•0 comments

Ask HN: Were CS profs right to look down on programming in light of modern AI?

4•amichail•1d ago•3 comments

Ask HN: How did you discover Hacker News?

10•chistev•2d ago•30 comments

Supply chain attack alert: .github/setup.js

25•antihero•2d ago•13 comments

Life saving / first aid posters

37•cpu_•5d ago•4 comments

Google killed my $1M ARR startup overnight

23•vadumo•3d ago•13 comments

Ask HN: What would you name your own LLM?

4•akashwadhwani35•1d ago•2 comments

Ask HN: Is Everyone an Engineer Now?

7•piratesAndSons•3d ago•12 comments

Ask HN: Is Azure capacity this constraind or am I doing it wrong?

11•lanycrost•2d ago•15 comments

Ask HN: What are the best unknown books you have read?

7•chistev•12h ago•3 comments

Ask HN: Why isn't AI image generation closely linked with graphics code gen?

3•amichail•2d ago•2 comments