The models are nondeterministic, and therefore it's pretty normal for different runs to give different results.
I don't see this as evidence that Opus 4.6 has gotten worse.
Reubend•25m ago
The models are nondeterministic, and therefore it's pretty normal for different runs to give different results.
I don't see this as evidence that Opus 4.6 has gotten worse.