- Drop-in replacement for the OpenAI SDK (change one line: base_url) - Each query gets classified (regex fast path + lightweight LLM classifier) and matched against ~390 models - Three tiers (Frugal/Balanced/Premium) to control the quality-cost tradeoff - Automatic failover if a provider goes down - Cost metadata in every response
The routing logic is benchmark-driven (LMArena, Artificial Analysis), not ML-based — simpler to debug and reason about. The regex fast path handles ~60% of requests in under 5ms with zero API calls.
Example: a customer support bot doing 10K conversations/month went from ~$250/mo (everything pinned to Opus 4.6) to ~$40/mo with routing. Most conversations were FAQ-level questions that a smaller model handled fine.
Stack: Next.js, Vercel, Neon PostgreSQL, OpenRouter upstream. Hosting cost: ~$20/month.
We ran a head-to-head benchmark: same 15 prompts through Opus, GPT-4o, Gemini Pro, and the router. Simple tasks cost 66% less with routing. Complex tasks produced 2x more detailed output because the router picked specialized models per task type. Full data: https://dev.to/robinbanner/we-benchmarked-4-ai-api-strategie...
Architecture writeup: https://dev.to/robinbanner/inside-komilions-architecture-how... — there's a free tier if you want to try it.
robinbanner•1h ago
Then I looked at the actual queries. 70% were things like "what are your hours?" and "how do I return something?" — questions where a $0.80/M-token model gives the same answer as a $15/M-token model. But about 5% were genuinely complex (multi-step troubleshooting, product comparisons requiring reasoning) where Opus was noticeably better.
I started manually routing: simple patterns to a cheap model, everything else to Opus. The bill dropped to $40/month with no quality complaints from users. But maintaining the routing logic across projects got tedious — every new app needed the same classification + model selection + failover logic.
So I built Komilion to package it up. The classification runs in two stages:
1. A regex fast path catches ~60% of requests instantly (greetings, FAQ patterns, simple classification tasks). Zero API calls, under 5ms.
2. For the rest, a lightweight LLM classifier determines task type and complexity, then matches against a routing table built from LMArena and Artificial Analysis benchmark data.
What surprised me in the benchmark data: complex tasks through the router actually produced MORE detailed output than any single pinned model (6,614 chars avg vs 3,573 for Opus). The router selects specialized models per task type rather than using a generalist model for everything.
Stack: Next.js on Vercel, Neon PostgreSQL, OpenRouter upstream. Total hosting cost ~$20/month. It's a solo project.
The thing I'd do differently: I should have started with the benchmark data instead of building the product first. The numbers make the case better than any feature list.
Happy to answer technical questions about the routing logic, benchmark methodology, or anything else.