There are a lot of SOTA and smaller models coming out every month and many of them claim great coding output, tool execution, etc at a better cost than their competitor, but i havent been able to find any up-to-date benchmark that would actually confirm and compare these models in terms of speed, quality and price.
For instance: https://gso-bench.github.io/leaderboard.html seems to be a few months behind and is missing few key models like Grok and some others.
How do you decide which model to use for your day-to-day and are there good metrics that help with that decision