I've been thinking about something like this. It would be really helpful to add another axis here which is the comparability in output across models for different tasks. So like if you are doing some type of classification or typing activity, can you get 95% of the performance of Sonnet with Haiku, or 89% of the performance of Sonnet with Gemma4. Then the cost/capability matrix space becomes more rich because you can decompose tasks and assign out according to cost and capability
gbibas•1h ago