Overall an improvement over Opus 4.8, but I'd still say Gemini 3.1 Pro has more of an artistic vision even tho it fails tool calls and writes buggy code sometimes.
Ik almost everyone is interested just in the SWE stuff, but this has been a good eval for me to think about how big the model is, how "creative" it is for generating new ideas etc.
mesmertech•1h ago
Ik almost everyone is interested just in the SWE stuff, but this has been a good eval for me to think about how big the model is, how "creative" it is for generating new ideas etc.
More results from fable, with comparisons for Gemini, opus and some open source models: https://mesmer.tools/benchmarks/ai-video-generation