1.5B LLM routing model that aligns to preferences, not leaderboards

https://huggingface.co/katanemo/Arch-Router-1.5B

3•honorable_coder•6mo ago

Comments

honorable_coder•6mo ago

Hi HN — we're the team behind Arch (an open-source edge and service proxy for agents)[1], and today we're releasing Arch-Router (https://huggingface.co/katanemo/Arch-Router-1.5B), a 1.5B LLM router model designed to align to user-defined preferences, not public benchmarks and leader boards.

As teams integrate multiple LLMs - each with different strengths, styles, or cost/latency profiles — routing the right prompt to the right model becomes a critical part of the application design. But it's still an open problem. Most routing systems fall into two camps:

- Embedding-based routers use intent classifiers — label a prompt as “support,” “SQL,” or “math,” then route to a matching model. This works for simple tasks but breaks down in real conversations. Users shift topics mid-conversation, task boundaries blur, and product changes require retraining classifiers.

- Performance-based routers pick models based on benchmarks like MMLU or MT-Bench, or based on latency or cost curves. But benchmarks often can't capture what matters in production: domain-specific quality or subjective evaluation criteria. These routers are often opaque, difficult to debug, and their quality judgments can feel arbitrary, failing to capture the subjective nuance of what a “good” response actually means for a specific user’s intent.

Arch-Router takes a different approach: route to LLMs based on preferences written as policies in plain ol English.

You write policies like “contract clauses → GPT-4o” or “quick travel tips → Gemini Flash.” The router maps the prompt (and the full conversation context) to those policies using a lightweight 1.5B auto-regressive model. The model is capable to handle intent drift, supports multi-turn conversations, and lets you swap in or out models with a one-line change to the routing policy. To read more about the strength of our model, check out our research paper here: https://arxiv.org/abs/2506.16655

Essentially, Arch-Router splits the routing process into two distinct parts:

    Route Selection: This is the what. The system defines a set of human-readable routing policies using a “Domain-Action Taxonomy.” Think of it as a clear API contract written in plain English. A policy isn’t just intent_123; it’s a descriptive label like Domain: ‘finance’, Action: ‘analyze earnings report’. The router’s only job is to match the user’s query to the best-fit policy description.

    Model Assignment: This is the how. A separate, simple mapping configuration connects each policy to a specific LLM. The finance/"analyze earnings report" policy might map to a powerful model like GPT-4o, while a simpler general/"greeting" policy maps to a faster, cheaper model.

Specs:

- 1.5B params — runs on a single GPU (or CPU for testing)

- No retraining needed — point it at any mix of LLMs

- Outperforms larger closed models on our conversational routing benchmarks (details in the paper)

Links:

[1] Arch Proxy: https://github.com/katanemo/archgw

Are AI agents ready for the workplace? A new benchmark raises doubts

AI Watermark and Stego Scanner

Clarity vs. complexity: the invisible work of subtraction

Solid-State Freezer Needs No Refrigerants

Ask HN: Will LLMs/AI Decrease Human Intelligence and Make Expertise a Commodity?

From Zero to Hero: A Brief Introduction to Spring Boot

NSA detected phone call between foreign intelligence and person close to Trump

How to Fake a Robotics Result

It's time for the world to boycott the US

Show HN: Semantic Search for terminal commands in the Browser (No Back end)

The AI CEO Experiment

Speed up responses with fast mode

MS-DOS game copy protection and cracks

Updates on GNU/Hurd progress [video]

Epstein took a photo of his 2015 dinner with Zuckerberg and Musk

MyFlames: Visualize MySQL query execution plans as interactive FlameGraphs

Show HN: LLM of Babel

A modern iperf3 alternative with a live TUI, multi-client server, QUIC support

Famfamfam Silk icons – also with CSS spritesheet

Apple is the only Big Tech company whose capex declined last quarter

Reverse-Engineering Raiders of the Lost Ark for the Atari 2600

Show HN: Deterministic NDJSON audit logs – v1.2 update (structural gaps)

The Greater Copenhagen Region could be your friend's next career move

Do Not Confirm – Fiction by OpenClaw

The Analytical Profile of Peas

Hallucinations in GPT5 – Can models say "I don't know" (June 2025)

What AI is good for, according to developers

OpenAI might pivot to the "most addictive digital friend" or face extinction

Show HN: Know how your SaaS is doing in 30 seconds

ClawdBot Ordered Me Lunch