I use strong reasoning models to understand a codebase or plan changes, then switch to faster and cheaper models for implementation and refactors. In practice this means mixing providers like Anthropic, OpenAI, and Google, because different tasks need different capabilities.
So why do AI coding platforms insist on a single model, often from a single provider, for the entire workflow?
Why burn expensive reasoning tokens while writing boilerplate? Why should planning, coding, and review all be done by the same “brain”? Why do users have to manually glue models together when platforms already have full task context?
This feels less like a technical limitation and more like a product decision.
Maybe multi-model coordination is genuinely hard. Maybe handoffs lose context. Maybe this breaks the “one AI engineer” narrative that demos well.
But engineers already do this themselves today.
Has anyone run multi-model workflows on real repos? Did it fail in non-obvious ways?
Interested in real experience, not demos.