I'm kind of bothered by how many folks in the "AI influencer" space just pick up on the latest model hype, "Grok 4 changes EVERYTHING" type of nonsense.
And Grok 4 is a great example where they're just completely lying about the practical results. Elon wants to claim this is the smartest model, but it's like... 3rd or 4th best, at best.
Benchmarks, for a variety of reasons, now seem inadequate to capture models' actual strength, so I decided to run Grok 4 and o3 (and Grok 4 Heavy + o3-pro) through a gauntlet of questions that I think demonstrate real, practical differences between the two.
Sherveen•5h ago
And Grok 4 is a great example where they're just completely lying about the practical results. Elon wants to claim this is the smartest model, but it's like... 3rd or 4th best, at best.
Benchmarks, for a variety of reasons, now seem inadequate to capture models' actual strength, so I decided to run Grok 4 and o3 (and Grok 4 Heavy + o3-pro) through a gauntlet of questions that I think demonstrate real, practical differences between the two.
Hope this is helpful!