Like does reasoning find a gradient to optimize a solution? Or are they just trying to expand state until finding what the LLMs world knowledge would say is highest probability?
For example, I can imagine an LLM reasoner might run out of state trying to perfectly solve for 50 intricate unit tests. Because it ping pongs between solving one case, then another, playing whack-a-mole and not converging.
Maybe there's an "oh duh" answer to this, but where I struggle with the limits of agentic work vs traditional ML.
mgl•4h ago