I've been working on using in-browser LLM models for agentic data analysis tasks and was frustrated trying to work out what models were worth trying so I built a benchmark. It grew a bit, but has fairly comprehensive coverage and visualizations of models from Opus down to Qwen 0.8B