Is it really so simple that all information in an LLM comes from the probability of each token based on the prompt? So for any prompt, there is a probability distribution to continuing (after) that prompt to generate text?
All structure of information comes from probabilities of tokens (so all structure and information processing is a side effect of token probabilities)? Or is there other stuff going on? I know reasoning models have extra stuff but let's put that aside for now.
jxhcbu•30m ago