In addition to the non-empty input, 153 reasoning tokens were produced.
When setting max tokens to 100, the output is empty, and the token limit of 100 has been exhausted with reasoning tokens.
Anyway later they concede that it's not 100% deterministic, because
> Temperature 0 non-determinism. While all confirmatory results were 30/30, known floating-point non-determinism exists at temperature 0 in both APIs. One control concept (thunder) showed 1/30 void on GPT, demonstrating marginal non-determinism.
Actually FP non-determinism affects runs between different machines giving different output. But in the same machine, FP is fully deterministic. (it can be made to be cross-platform deterministic with some performance penalty in at least some machines)
What makes computers non-deterministic here is concurrency. Concurrent code can interleave differently at each run. However it is possible to build LLMs that are 100% deterministic [0] (you can make them deterministic if those interleavings have the same results), it's just that people generally don't do that.
[0] for example, fabrice bellard's ts_zip https://bellard.org/ts_zip/ uses a llm to compress text. It would not be able to decompress the text losslessly if it weren't fully deterministic
bob1029•58m ago
"Prompts sometimes return null"
I would be very cautious to attribute any of this to black box LLM weight matrices. Models like GPT and Opus are more than just a single model. These products rake your prompt over the coals a few times before responding now. Telling the model to return "nothing" is very likely to perform to expectation with these extra layers.
tiku•12m ago