frontpage.

Show HN: Arch-Router – Aligning LLM Routing with Human Preferences

https://arxiv.org/abs/2506.16655

1•honorable_coder•2h ago

Hi HN — we're the team behind ArchGW [1], an edge and service proxy for agents written in Rust, and we just recently published our research on LLM Routing: https://arxiv.org/abs/2506.16655

As teams integrate multiple LLMs - each with different strengths, styles, or cost/latency profiles — routing the right prompt to the right model becomes a critical part of the application design. But it's still an open problem. Most routing systems fall into two camps:

- Embedding-based routers use intent classifiers — label a prompt as “support,” “SQL,” or “math,” then route to a matching model. This works for simple tasks but breaks down in real conversations. Users shift topics mid-conversation, task boundaries blur, and product changes require retraining classifiers.

- Performance-based routers pick models based on benchmarks like MMLU or MT-Bench, or based on latency or cost curves. But benchmarks often miss what matters in production: domain-specific quality or subjective preferences like “Will legal accept this clause?”

Arch-Router takes a different approach: it decouples route selection from model assignment. Developers can write route policies using a domain-action taxonomy (like "engineering" or "image editing" respectively), and the router maps the prompt (and conversation context) to those policies using a lightweight 1.5B autoregressive model. No retraining, no fragile if/else chains. We built this with input from teams at Twilio and Atlassian. Arch-Router handles intent drift, supports multi-turn conversations, and lets you swap in or out models with a one-line change to the routing policy. Full details are in our paper, but here's a snapshot:

Specs:

- 1.5B params — runs on a single GPU (or CPU for testing)

- No retraining needed — point it at any mix of LLMs

- Routing can be cost, latency or quality aware based on your preferences

- Outperforms larger closed models on our conversational routing benchmarks (benchmarks in the paper)

Links:

- ArchGW (open source edge and service proxy for agents ): https://github.com/katanemo/archgw

- Model + code: https://huggingface.co/katanemo/Arch-Router-1.5B

- Paper: https://arxiv.org/abs/2506.16655

Does this look like a real woman? AI model in Vogue

Portability of Tar Features

How FastAPI Works

ChatGPT launches study mode to encourage 'responsible' academic use

Claude Code and shipping stuff to prod

Study reveals that 12-year-olds see OnlyFans as an alternative to work

Chroma: Open-source search and retrieval database for AI applications

What'll happen if we spend nearly $3T on data centres no one needs?

Grok Explores Browardlocals.com Impact

Machine took control of my brain and eyeballs [video]

Three bad things: threads, garbage collection, and nondeterministic destructors

Scheme-dql: S-expression data query language module

How the Martian Was Written

Agent2Agent – Samples

Why did Anthropic chose an anus for Claude's logo?

Microsoft researchers have revealed the 40 jobs most exposed to AI

I got Wan 2.2 working in ComfyUI with just 8GB VRAM – here's the workflow

US Military's squad of satellite trackers is now routinely going on alert

Things I miss about civilization

Royal Society right to keep Elon Musk as member, says new astronomer royal

Show HN: Team Timezone Wall (100% offline, single file)

Show HN: Trivia Player

Show HN: AI tool that builds and deploys n8n workflows from a single prompt

AxxSolder

Ask HN: Am I Alone Here?

Kadag Security – AI-driven security testing by running your app

Has anyone tried FakeFind? It analyzes reviews like Fakespot used to

Banana Pi BPI-R4 Lite Released with MediaTek MT7987A and Wi-Fi 7 Support

(NSFW) Google Search LLM Halluninating

Modos Paper Monitor Brings High-Speed E-Paper to Developers