Over 60 AI models across 8 categories. The entire benchmark was built and runs inside of n8n and scores models on actual use cases in n8n rather than conversational style or subjective preference.
Smaller models often outperform larger models when it comes to specific tasks. We also found that a model's list price doesn't tell the whole story. One model priced at half the cost of competitors ended up being 10x more expensive in practice because it was so verbose in its outputs.
No single model dominates every category, so use the category filter on the benchmark page to find the best fit for your specific workflow. Whether you're building AI agents, automating data extraction, or generating code, this benchmark helps you make more informed decisions and build more cost-effective solutions.
james2doyle•1h ago
Smaller models often outperform larger models when it comes to specific tasks. We also found that a model's list price doesn't tell the whole story. One model priced at half the cost of competitors ended up being 10x more expensive in practice because it was so verbose in its outputs.
No single model dominates every category, so use the category filter on the benchmark page to find the best fit for your specific workflow. Whether you're building AI agents, automating data extraction, or generating code, this benchmark helps you make more informed decisions and build more cost-effective solutions.