Show HN: API router that picks the cheapest model that fits each query

https://www.komilion.com/

1•robinbanner•1h ago

I got frustrated paying $60/M tokens for reasoning queries when a $0.80/M model gives comparable results for most of them. So I built Komilion — a model router that classifies each API request and routes it to a cheaper model that fits.

- Drop-in replacement for the OpenAI SDK (change one line: base_url) - Each query gets classified (regex fast path + lightweight LLM classifier) and matched against ~390 models - Three tiers (Frugal/Balanced/Premium) to control the quality-cost tradeoff - Automatic failover if a provider goes down - Cost metadata in every response

The routing logic is benchmark-driven (LMArena, Artificial Analysis), not ML-based — simpler to debug and reason about. The regex fast path handles ~60% of requests in under 5ms with zero API calls.

Example: a customer support bot doing 10K conversations/month went from ~$250/mo (everything pinned to Opus 4.6) to ~$40/mo with routing. Most conversations were FAQ-level questions that a smaller model handled fine.

Stack: Next.js, Vercel, Neon PostgreSQL, OpenRouter upstream. Hosting cost: ~$20/month.

We ran a head-to-head benchmark: same 15 prompts through Opus, GPT-4o, Gemini Pro, and the router. Simple tasks cost 66% less with routing. Complex tasks produced 2x more detailed output because the router picked specialized models per task type. Full data: https://dev.to/robinbanner/we-benchmarked-4-ai-api-strategie...

Architecture writeup: https://dev.to/robinbanner/inside-komilions-architecture-how... — there's a free tier if you want to try it.

Comments

robinbanner•1h ago

Backstory: I was building a customer support AI for a client last year. We started with Claude Opus for everything because it worked great. The bill was $250/month for maybe 10K conversations.

Then I looked at the actual queries. 70% were things like "what are your hours?" and "how do I return something?" — questions where a $0.80/M-token model gives the same answer as a $15/M-token model. But about 5% were genuinely complex (multi-step troubleshooting, product comparisons requiring reasoning) where Opus was noticeably better.

I started manually routing: simple patterns to a cheap model, everything else to Opus. The bill dropped to $40/month with no quality complaints from users. But maintaining the routing logic across projects got tedious — every new app needed the same classification + model selection + failover logic.

So I built Komilion to package it up. The classification runs in two stages:

1. A regex fast path catches ~60% of requests instantly (greetings, FAQ patterns, simple classification tasks). Zero API calls, under 5ms.

2. For the rest, a lightweight LLM classifier determines task type and complexity, then matches against a routing table built from LMArena and Artificial Analysis benchmark data.

What surprised me in the benchmark data: complex tasks through the router actually produced MORE detailed output than any single pinned model (6,614 chars avg vs 3,573 for Opus). The router selects specialized models per task type rather than using a generalist model for everything.

Stack: Next.js on Vercel, Neon PostgreSQL, OpenRouter upstream. Total hosting cost ~$20/month. It's a solo project.

The thing I'd do differently: I should have started with the benchmark data instead of building the product first. The numbers make the case better than any feature list.

Happy to answer technical questions about the routing logic, benchmark methodology, or anything else.

GenAI does not just hallucinate at us, it can hallucinate with us, study warns

Show HN: East Asia AQI/wind vector map

Chrome extension to detect AI-written text and anonymous chat to any website

Building Custom Docker Sandboxes

Bengt Hires a Human–Towards a Happy Future with AI Employers

Russian state media meddles in Swiss public broadcasting referendum

Deploy your OpenClaw agent in 5 minutes

I Joined the MariaDB Foundation

A Love Letter to Self-Hosting

If AI writes most of the code, understanding codebases becomes the bottleneck

Break Stasis

Undetected Past Contacts with Technological Species and Technosignature Science

Password managers less secure than promised

Trying New Things

macOS Tahoe Finder Bug Underscores Apple's Slipping UI Polish

Google warns EU against 'erecting walls' in tech sovereignty push

How to take a photo with scotch tape (lensless imaging) [video]

GrowthClaw: Marketing workflows for OpenClaw with evaluation gates

Unitree's humanoid robot team's performance at the 2026 Spring Festival Gala

Programming a 144-computer chip to minimize power (2013) [video]

Show HN: CabbageSEO: Check if AI mentions your business, then fix it if not

Show HN: Comfy Pilot – MCP server that lets Claude Code edit ComfyUI workflows

(Un)portable defer in C

Dyslexia, Programming and Lisp

Integration patterns: How we connect software

Architecting AI-ready infrastructure for the agentic era

What's Your Attention Worth? – The Ad Spend Calculator

A Historical Reference of React Criticism

Show HN: Hackable Skinny Clawdbot for Telegram

Show HN: An beautiful webpage I made