This is probably better phrased as "LLMs may not provide consistent answers due to changing data and built-in randomness."
Barring rare(?) GPU race conditions, LLMs produce the same output given the same inputs.
With batching matrix shapes/request position in them aren’t deterministic and this leads to non deterministic results, regardless of sampling temperature/seed.
If I had a black box api, just because you don't know how it's calculated doesn't mean that it's non-deterministic. It's the underlaying algorithm that determines that and a LLM is deterministic.
It’s inherently non deterministic because it reflects the reality of having different requests coming to the servers at the same time. And I don’t believe there are any realistic workarounds if you want to keep costs reasonable.
Edit: there might be workarounds if matmul algorithms will give stronger guarantees then they are today (invariance on rows/columns swap). Not an expert to say how feasible it is, especially in quantized scenario.
Inference on a generic LLM may not be subject to these non-determinisms even on a GPU though, idk
dekhn from a decade ago cared a lot about stable outputs. dekhn today thinks sampling from a distribution is a far more practical approach for nearly all use cases. I could see it mattering when the false negative rate of a medical diagnostic exceeded a reasonable threshold.
But it does when coupled with non-deterministic requests batching, which is the case.
If he did have a sense of what people expect, he would know nobody wants Grok to give his personal opinion on issues. They want Grok to explain the emotional landscape of controversial issues, explaining the passion people feel on both sides and the reasons for their feelings. Asked to pick a side with one word, the expected response is "As an AI, I don't have an opinion on the matter."
He may be tuning Grok based on a specific ideological framework that prioritizes contrarian or ‘anti-woke’ narratives to instruct Grok's tuning. That's turning out to be disastrous. He needs someone like Amanda Askell at Anthropic to help guide the tuning.
Absolutely. That said, I'm not sure Sam Altman, Dario Amodei, and others are notably empathetic either.
LLM bugs are weird.
Ignoring the context of the past month where he has repeatedly said he plans on 'fixing' the bot to align with his perspective feels like the LLM world's equivalent of "to me it looked he was waving awkwardly", no?
It’s so fascinating that right-wing views are so similar to what is usually decried in the next sentence.
Perhaps you should start looking for other methods to educate yourself.
"I’m sure you believe everything you’re saying. But what I’m saying is that if you believed something different, you wouldn’t be sitting where you’re sitting."
Simon may well be right - xAI might not have directly instructed Grok to check what the boss thinks before responding - but that's not to say xAI wouldn't be more likely to release a model that does agree with the boss a lot and privileges what he has said when reasoning.
rasengan•2h ago
mingus88•2h ago
rideontime•1h ago