Half of the data is missing and the rest is inconsistent between different graphs and sections. Is the benchmark having Sonnet 5 generate the page and seeing how many hallucinations it has?
Tiberium•11m ago
Seems like the model is incredibly inefficient at max reasoning, and even at high/xhigh it uses far more tokens than other models, including Gemini 3.5 Flash, GLM 5.2 and so on. GPT 5.5's efficiency in tokens is still unmatched.
iLoveOncall•12m ago