How it works: We extract features from prompts (complexity, tool usage, context length) and route them using a DeBERTa classifier trained on model evaluations. Simple tasks → cheaper models, complex reasoning → premium models. The routing adds ~2ms latency but saves orders of magnitude in API costs.
Same Claude Code experience, same quality, much cheaper.