Otherwise, doing a H/T comparison is just a proxy to what the underlying token probabilities are and the temperature configuration (+hardware differences for a remote-hosted model).
I had an hour to kill and did this experiment.
This what the “temperature” parameter of an LLM controls. Setting the temperature of an LLM to 0 effectively disables that randomness, but the result is a very boring output that’s likely to end up caught in a never ending loop of useless output.
When you asked it to choose by picking a random number between 1 and 4, it skewed the results heavily to 2 and 3. It could have interpreted your instructions to mean literally between 1 and 4 (not inclusive).
I do not know about other LLMs, but Cohere allows setting a seed value. When setting the same seed value it will always give you the same result for a specific prompt (of course unless the LLM gets an update).
OTOH I would guess that they normally simply generate a random seed value on the server side when processing a prompt, and it depends on their random number generator how random that really is.
Pretty good.
How cryptographically secure would an LLM rng seed generator be?
If LLMs are anything like people, I would expect a different result depending on that. The idea that random events are independent is very unintuitive to us, resulting in what we call the Gambler's Fallacy. LLMs attempts at randomness are very likely to be just as biased, if not more.
I personnaly would define ideal randomness as a behavior that is fundamentally uncomputable and/or cannot be expressed as a mathematical function. If this definition holds than the question cannot apply to LLMs as they are a just (big) mathematical function.
Why are all these posts and news about LLMs so uninformed? This is human built technology. You can actually read up how these things work. And yet they are treated as if it were an alien species that must be examined by sociological means and methods where it is not necessary. Grinds my gears every time :D
https://github.com/huggingface/transformers/blob/d538293f62f...
Layers can be computed in slightly different orders (due to parallelism), on different GPU models, and this will cause small numerical differences which will compound due to auto-regression.
Sure, it’s theoretically deterministic, but so are many natural processes like air pressure, or the three body problem, or nuclear decay, if only we had all the inputs and fixed all the variables, but the reality is that we can’t and it’s not particularly useful to say that well if we could it’d be deterministic.
Think of each new ‘interaction’ with the LLM as having two things that can change: the context and the PRNG state. We can also think of the PRNG state as having two things: the random seed (which makes the output sequence), and the index of the last consumed random value from the PRNG. If the context, random seed, and index are the same, then the LLM will always give the same answer. Just to be clear, the only ‘randomness’ in these state values comes from the random seed itself.
The LLM doesn’t make any randomness, it needs randomness as an input (hyper)parameter.
EDIT: I'm seeing another poster saying "Deterministic with a random seed?" That's a good point--all the non-determinism comes from the seed, which isn't particularly critical to the algorithm. One could easily make an LLM deterministic by simply always using the same seed.
not fully true, when using floating point the order of operations matters, and it can vary slightly due to parallelism. I've seen LLMs return different outputs with the same seed.
It seems like that would make it hard to unit test LLM code, but they seem to be managing.
While you can definitely read about how some parts of a very complex neural network function, it's very challenging to understand the underlying patterns.
That's why even the people who invented components of these networks still invest in areas like mechanistic interpretability, trying to develop a model of how these systems actually operate. See https://www.transformer-circuits.pub/2022/mech-interp-essay (Chris Olah)
1. Give a model a context with some # of actually random numbers and then ask it to generate the next random number. How random is that number? Repeat N times, graph the results, is there anything interesting about the results?
2. I remember reading about how brains/etc are kinda edge-balanced chaotic systems. So if a model is bad at outputting random numbers (ie: needs a very high temperature for the experiment from step 1 to produce a good distribution of random numbers) What if anything does that tell us about the model?
3. Can we add a training step/fine-tuning step that makes the model better at the experiment from step #2? What effect does that have on its benchmarks?
I'm not an ML researcher, so maybe this is still nonsense.
Otherwise, parallel floating point computations like these are not going to be perfectly deterministic, due to a combination of two factors. First, the order of some operations will be random due to all sorts of environmental conditions such as temperature variations. Second, floating point operations like addition are not ~~commutative~~ associative (thanks!!), which surprises people unfamiliar with how they work.
That is before we even talk about the temperature setting on LLMs.
maybe you meant associative? Floating point addition is commutative: a+b is always equal to b+a for any values of a and b. It is not associative, though: a+(b+c) is in general different to (a+b)+c, think what happens if a is tiny and b,c are huge, for example.
To think that I used to do this for a living...
On a more serious note, you could always adjust the temperature so they behave more randomly.
dr_dshiv•6h ago
In a project last year, I did a combination of LLMs plus a list of random numbers from a quantum computer. Random numbers are the only useful things quantum computers can produce—and that is one thing LLMs are terrible at