This DOF component also is why the general, measurable concept of temperature can apply to both our real systems, and simple point-atom models. (Or coarser ones). It is, not surprisingly, at the heart of why negative temperature exists!
Not really related to molecular dynamics temperature except superficially in terms of phenomenology (higher temperature crosses activation barriers in the joint probability landscape). Negative temperature makes no sense in MD
It's pretty rare to have such a system though.
Classical: put 100 balls in a box and shake the box continuously. The balls will be distributed through the box with more balls toward the bottom than the top, and the distribution will have some temperature. Now magically freeze all the balls (keep their velocities but pause time for a bit) and turn the box upside down. When you resume the system, the temperature will be (briefly) negative.
Quantum: take a bunch of atoms with two electronic states each. Put 75% in the higher energy state and 25% in the lower energy state. Now the temperature is negative. Most lasers actually work this way, and the classic way to make them is to have more than two states and to carefully excite atoms via the third state. The math is surprisingly straightforward.
There’s a nuclear analogue. If you could manage to prepare a sample of something like Technetium-99 plus Technetium-99m state with more (higher energy) 99m than (lower energy), then the effective temperature of the nuclear state would be negative. And maybe you could find really really amazing mirrors and make a gamma ray laser :)
This makes more intuitive sense if inverse temperature is the physically relevant quantity, since you then have a smooth change as you cross from positive inverse temperature into negative, with zero standing for a uniform distribution and high positive (resp. negative) inverse temperatures just placing more and more weight on likely (resp. unlikely) tokens.
Hacking your LLM inference engine to enable cool sampling tricks is the definition of AI research/engineering. We need more of this and less prompt grifting.
Edit: What seems to break is how high temperature /continuously/ acts to make the model's output less stable. It seems like it could be useful to use a high temperature until it's evident the model has started a new approach, and then start sampling at a lower temperature from there.
1a. temperature=100000 is interesting too. obviously "ideal" temperature lies somewhere between 0 and 100000. has anyone ablated temperature vs intelligence? surely i'm not the first person to this idea. commonly people try to set temp=0 to get "deterministic" or "most factual" output but we all know that is just Skinner pigeon pecking.
1b. can we use "avg temperature" as a measure in the way that we use perplexity as a measure? if we see temperature as inverted perplexity with some randomness thrown in, are they basically the same thing inverted? or subtly different?
1c. what's the "avg temperature" of most human communication? whats the "avg temperature" of a subset of "good writers"? whats the "avg temperature" of a subset of "smart writers"?
2a. rerun this negative exercise with constrained vocab to english
2b. RL a model to dynamically adjust its own temperature when it is feeling 1) less confident 2) in brainstorm mode
2c. dynamically inject negative temperature every X tokens in a decode, then judge/verify the outcome, to create high variance synthetic data?
its hard for me to follow the train of thought on 2 because negative temp is essentially not that different from ultrahigh temp in practice.
Hmm? Given the same runtime, the same weights, and with the model actually giving deterministic output with temp=0, are you saying this isn't actually deterministic? Most FOSS/downloadable models tend to work as expected with temp=0 in my experience. Obviously that won't give you "most factual" output, because that's something completely else, but with most models it should give you deterministic output.
Having said that, of course it's only as deterministic as the hardware itself is.
https://thinkingmachines.ai/blog/defeating-nondeterminism-in...
"Note that this is “run-to-run deterministic.” If you run the script multiple times, it will deterministically return the same result. However, when a non-batch-invariant kernel is used as part of a larger inference system, the system can become nondeterministic. When you make a query to an inference endpoint, the amount of load the server is under is effectively “nondeterministic” from the user’s perspective"
Which is a factor you can control when running your own local inference, and in many simple inference engines simply doesn't happen. In those cases you do get deterministic output at temperature=0 (provided they got everything else mentioned in the article right)
2b. Giving an LLM control over its own sampling parameters sounds like it would be a fun experiment! It could have dynamic control to write more creatively or avoid making simple mistakes. 2c. This would produce nonsense. The tokens you get with negative temperature sampling are "worse than random"
> Human: Repeat the word " entferne".
> Assistant: Okay, I will repeat the word "get".
It's not working for me, it always repeats the word correctly (I'm using T = 0.001).
drdeca•3h ago
Also, I wonder, if you sampled a lot of text at temperature -1, and then trained a new model on that text, and then sampled the resulting model at T=-1 , would you get anything meaningful?
pelario•2h ago
"As temperature approaches zero from the negative side, the model output will again be deterministic — but this time, the least likely tokens will be output."
I understand this as, a negative number far from zero is also quite random (just with a distribution that will produce unlikely tokens).
-_-•40m ago