frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: API router that picks the cheapest model that fits each query

https://www.komilion.com/
1•robinbanner•1h ago
I got frustrated paying $60/M tokens for reasoning queries when a $0.80/M model gives comparable results for most of them. So I built Komilion — a model router that classifies each API request and routes it to a cheaper model that fits.

- Drop-in replacement for the OpenAI SDK (change one line: base_url) - Each query gets classified (regex fast path + lightweight LLM classifier) and matched against ~390 models - Three tiers (Frugal/Balanced/Premium) to control the quality-cost tradeoff - Automatic failover if a provider goes down - Cost metadata in every response

The routing logic is benchmark-driven (LMArena, Artificial Analysis), not ML-based — simpler to debug and reason about. The regex fast path handles ~60% of requests in under 5ms with zero API calls.

Example: a customer support bot doing 10K conversations/month went from ~$250/mo (everything pinned to Opus 4.6) to ~$40/mo with routing. Most conversations were FAQ-level questions that a smaller model handled fine.

Stack: Next.js, Vercel, Neon PostgreSQL, OpenRouter upstream. Hosting cost: ~$20/month.

We ran a head-to-head benchmark: same 15 prompts through Opus, GPT-4o, Gemini Pro, and the router. Simple tasks cost 66% less with routing. Complex tasks produced 2x more detailed output because the router picked specialized models per task type. Full data: https://dev.to/robinbanner/we-benchmarked-4-ai-api-strategie...

Architecture writeup: https://dev.to/robinbanner/inside-komilions-architecture-how... — there's a free tier if you want to try it.

Comments

robinbanner•1h ago
Backstory: I was building a customer support AI for a client last year. We started with Claude Opus for everything because it worked great. The bill was $250/month for maybe 10K conversations.

Then I looked at the actual queries. 70% were things like "what are your hours?" and "how do I return something?" — questions where a $0.80/M-token model gives the same answer as a $15/M-token model. But about 5% were genuinely complex (multi-step troubleshooting, product comparisons requiring reasoning) where Opus was noticeably better.

I started manually routing: simple patterns to a cheap model, everything else to Opus. The bill dropped to $40/month with no quality complaints from users. But maintaining the routing logic across projects got tedious — every new app needed the same classification + model selection + failover logic.

So I built Komilion to package it up. The classification runs in two stages:

1. A regex fast path catches ~60% of requests instantly (greetings, FAQ patterns, simple classification tasks). Zero API calls, under 5ms.

2. For the rest, a lightweight LLM classifier determines task type and complexity, then matches against a routing table built from LMArena and Artificial Analysis benchmark data.

What surprised me in the benchmark data: complex tasks through the router actually produced MORE detailed output than any single pinned model (6,614 chars avg vs 3,573 for Opus). The router selects specialized models per task type rather than using a generalist model for everything.

Stack: Next.js on Vercel, Neon PostgreSQL, OpenRouter upstream. Total hosting cost ~$20/month. It's a solo project.

The thing I'd do differently: I should have started with the benchmark data instead of building the product first. The numbers make the case better than any feature list.

Happy to answer technical questions about the routing logic, benchmark methodology, or anything else.

GenAI does not just hallucinate at us, it can hallucinate with us, study warns

https://news.exeter.ac.uk/faculty-of-humanities-arts-and-social-sciences/generative-ai-does-not-j...
1•giuliomagnifico•1m ago•0 comments

Show HN: East Asia AQI/wind vector map

https://sanghoonio.github.io/air/
1•sanghoonio•2m ago•1 comments

Chrome extension to detect AI-written text and anonymous chat to any website

https://chromewebstore.google.com/detail/hiyo/nocfklgnphddgdaengibolefpmombome
1•Saikat2020•3m ago•1 comments

Building Custom Docker Sandboxes

https://substack.com/home/post/p-188153139
1•shelajev•3m ago•0 comments

Bengt Hires a Human–Towards a Happy Future with AI Employers

https://andonlabs.com/blog/bengt-hires-a-human
1•lukaspetersson•4m ago•1 comments

Russian state media meddles in Swiss public broadcasting referendum

https://www.20min.ch/story/halbierungsinitiative-russisches-staatsmedium-mischt-sich-in-srg-absti...
1•leohoferdev•5m ago•0 comments

Deploy your OpenClaw agent in 5 minutes

https://fastclaw.ai/
1•idoubi•5m ago•0 comments

I Joined the MariaDB Foundation

https://lefred.be/content/i-joined-the-mariadb-foundation/
2•eatonphil•6m ago•0 comments

A Love Letter to Self-Hosting

https://lukaswerner.com/post/2026-02-13@self-hosting-letter
1•chilipepperhott•6m ago•0 comments

If AI writes most of the code, understanding codebases becomes the bottleneck

https://app.tryarchaic.com/
2•baijan•6m ago•1 comments

Break Stasis

https://oldmanrahul.com/2026/02/15/break-stasis/
1•oldmanrahul•6m ago•0 comments

Undetected Past Contacts with Technological Species and Technosignature Science

https://iopscience.iop.org/article/10.3847/1538-3881/ae394b
1•bikenaga•6m ago•0 comments

Password managers less secure than promised

https://ethz.ch/en/news-and-events/eth-news/news/2026/02/password-managers-less-secure-than-promi...
6•winterdeaf•6m ago•0 comments

Trying New Things

https://daoudclarke.net/2026/02/16/trying-new-things
2•daoudc•7m ago•0 comments

macOS Tahoe Finder Bug Underscores Apple's Slipping UI Polish

https://www.macrumors.com/2026/02/13/macos-tahoe-finder-bug-slipping-ui-polish/
3•akyuu•9m ago•0 comments

Google warns EU against 'erecting walls' in tech sovereignty push

https://www.ft.com/content/0847914c-be27-4573-8600-8cdb54e604b7
3•spiffyk•9m ago•1 comments

How to take a photo with scotch tape (lensless imaging) [video]

https://www.youtube.com/watch?v=97f0nfU5Px0
3•surprisetalk•10m ago•0 comments

GrowthClaw: Marketing workflows for OpenClaw with evaluation gates

https://github.com/mrrkrieg/growthos
3•dankrieg•11m ago•2 comments

Unitree's humanoid robot team's performance at the 2026 Spring Festival Gala

https://twitter.com/cyberrobooo/status/2023378370592174272
3•DustinEchoes•11m ago•0 comments

Programming a 144-computer chip to minimize power (2013) [video]

https://www.youtube.com/watch?v=0PclgBd6_Zs
2•tosh•12m ago•0 comments

Show HN: CabbageSEO: Check if AI mentions your business, then fix it if not

https://www.cabbageseo.com/
2•arjun060601•12m ago•0 comments

Show HN: Comfy Pilot – MCP server that lets Claude Code edit ComfyUI workflows

https://github.com/ConstantineB6/comfy-pilot
2•0xConstantine•13m ago•0 comments

(Un)portable defer in C

https://antonz.org/defer-in-c/
1•birdculture•15m ago•0 comments

Dyslexia, Programming and Lisp

https://www.iwillig.me/blog/on-dyslexia-and-lisp/
2•_emacsomancer_•17m ago•0 comments

Integration patterns: How we connect software

https://staffbase.com/blog/integration-patterns
2•goblin89•18m ago•0 comments

Architecting AI-ready infrastructure for the agentic era

https://thenewstack.io/ai-ready-infrastructure/
1•dmk•19m ago•0 comments

What's Your Attention Worth? – The Ad Spend Calculator

https://attentionworth.com/
1•thunderbong•19m ago•0 comments

A Historical Reference of React Criticism

https://www.zachleat.com/web/react-criticism/
1•ishandotpage•19m ago•0 comments

Show HN: Hackable Skinny Clawdbot for Telegram

https://github.com/vseplet/smith
1•vseplet•19m ago•0 comments

Show HN: An beautiful webpage I made

https://github.com/adityaprasad-sudo/ExploreSingapore
1•gigachadai•20m ago•0 comments