If you care about things like output quality, cost, and latency, how do you evaluate models quickly, especially if you don’t want to write custom code or build evaluation pipelines?
Do you rely on docs and benchmarks, ask engineers to run experiments, manually test a few options, or something else?