frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Show HN: Arch-Router – 1.5B model for LLM routing by preferences, not benchmarks

34•adilhafeez•5h ago
Hi HN — we're the team behind Arch (https://github.com/katanemo/archgw), an open-source proxy for LLMs written in Rust. Today we're releasing Arch-Router (https://huggingface.co/katanemo/Arch-Router-1.5B), a 1.5B router model for preference-based routing, now integrated into the proxy. As teams integrate multiple LLMs - each with different strengths, styles, or cost/latency profiles — routing the right prompt to the right model becomes a critical part of the application design. But it's still an open problem. Most routing systems fall into two camps:

- Embedding-based routers use intent classifiers — label a prompt as “support,” “SQL,” or “math,” then route to a matching model. This works for simple tasks but breaks down in real conversations. Users shift topics mid-conversation, task boundaries blur, and product changes require retraining classifiers.

- Performance-based routers pick models based on benchmarks like MMLU or MT-Bench, or based on latency or cost curves. But benchmarks often miss what matters in production: domain-specific quality or subjective preferences like “Will legal accept this clause?”

Arch-Router takes a different approach: route by preferences written in plain language. You write rules like “contract clauses → GPT-4o” or “quick travel tips → Gemini Flash.” The router maps the prompt (and conversation context) to those rules using a lightweight 1.5B autoregressive model. No retraining, no fragile if/else chains. We built this with input from teams at Twilio and Atlassian. It handles intent drift, supports multi-turn conversations, and lets you swap in or out models with a one-line change to the routing policy. Full details are in our paper (https://arxiv.org/abs/2506.16655), but here's a snapshot:

Specs:

- 1.5B params — runs on a single GPU (or CPU for testing)

- No retraining needed — point it at any mix of LLMs

- Cost and latency aware — route heavy tasks to expensive models, light tasks to faster/cheaper ones

- Outperforms larger closed models on our conversational routing benchmarks (details in the paper)

Links:

- Arch Proxy (open source): https://github.com/katanemo/archgw

- Model + code: https://huggingface.co/katanemo/Arch-Router-1.5B

- Paper: https://arxiv.org/abs/2506.16655

Comments

sparacha•5h ago
Hi HN! I am one of the co-authors of the paper. If there are any questions about our approach, I would love to answer them.
tmaly•2h ago
do you think it would be possible to quantize this model and still get good results?
sparacha•2h ago
yes - we have already published a quantized version here: https://huggingface.co/katanemo/Arch-Router-1.5B.gguf. The performance difference with a quant version is negligible. I'll run another analysis and update the thread shortly
sparacha•52m ago
Overall performance degrades from 93.17 -> 92.99 with a quantized version
jedisct1•1h ago
I tried to use it to rate the difficulty level of coding tasks (for InferSwitch, an LLM router), but it performed far worse than Qwen2.5-Coder-7B (but sure, 1.5B vs 7B)
sparacha•1h ago
Can you share more about your evaluation setup? I would love to see the specific usage pattern as we have tested our model against smaller LLMs and foundational models and our results show things differently. Of course, routing policies should follow best practices here: https://docs.archgw.com/guides/llm_router.html

Nonetheless, super curious to learn more and see what we may be able to improve. This is technically not a classifier model - its a usage prediction model (feels like a classifier, but not quite in terms of intended usage)

cotran2•53m ago
According to the post, the model is fine-tuned for routing to different tasks/domains. Classifying difficulty level is probably not the intended use case.
jgant13•47m ago
Solid. Can you show us when to use this vs. say OpenRouter? The performance seems strong for sure. TIA.

Most cited scientists stop falsely claiming to work in Saudi Arabia

https://english.elpais.com/science-tech/2024-12-05/dozens-of-the-worlds-most-cited-scientists-stop-falsely-claiming-to-work-in-saudi-arabia.html
1•perihelions•32s ago•0 comments

The great MicroSD card survey

https://www.bahjeez.com/the-great-microsd-card-survey/
1•zdw•55s ago•0 comments

AI: The New Aesthetics of Fascism

https://newsocialist.org.uk/transmissions/ai-the-new-aesthetics-of-fascism/
1•BigglesB•1m ago•0 comments

Hematopoietic stem cell clonal evolution p autologous stem cell transplantation

https://www.nature.com/articles/s41588-025-02235-w
1•bookofjoe•1m ago•0 comments

Show HN: Have you ever wondered what the internals of webpack look like?

https://github.com/ertgl/tapable-tracer
1•-ertgl•1m ago•0 comments

Pangu Pro Moe: Mixture of Grouped Experts for Efficient Sparsity

https://arxiv.org/abs/2505.21411
1•diggan•1m ago•0 comments

Digital IDs for AI ensure security, accountability, and trust

https://subramanya.ai/2025/07/01/securing-ai-assistants-digital-ids-for-ai/
1•subramanya1997•2m ago•1 comments

How AI on Microcontrollers Works: Operators and Kernels

https://danielmangum.com/posts/ai-microcontrollers-operators-kernels/
1•hasheddan•7m ago•0 comments

Show HN: I built a enterprise level SaaS kit

https://www.launchkitaws.com/
1•UpbeatFix•8m ago•0 comments

I'm a physicist by trade, not by training, and that matters

https://csferrie.medium.com/im-a-physicist-by-trade-not-by-training-and-that-matters-70cd0e66b2c8
1•MaysonL•9m ago•0 comments

"Explosive increase" of ticks that cause meat allergy in US

https://www.theguardian.com/us-news/2025/jun/29/lone-star-ticks-increase-climate-crisis
2•sowbug•10m ago•0 comments

Large-Scale Deployment of Ray in Tencent's Weixin AI Infrastructure

https://www.anyscale.com/blog/tencent-weixin-ray-large-scale-deployment
1•robertnishihara•13m ago•0 comments

Show HN: Tacho – CLI tool to benchmark LLM speeds across providers

https://tacho.sh/
1•pietz•15m ago•0 comments

Fintech platform Wealthfront files for IPO

https://www.reuters.com/technology/wealthfront-corporation-confidentially-files-go-public-us-2025-06-23/
2•sowbug•16m ago•0 comments

GenesisAI raises $105M to build foundation models for robots with synthetic data

https://techcrunch.com/2025/07/01/genesis-ai-launches-with-105m-seed-funding-from-eclipse-khosla-to-build-ai-models-for-robots/
1•elmazout•16m ago•0 comments

Show HN: Optimization for LLM App

https://www.llmcheck.app
1•sansreal•18m ago•1 comments

Trump team threatens to prosecute CNN over reporting on Ice-tracking app

https://www.theguardian.com/us-news/2025/jul/01/trump-kristi-noem-cnn-threat
7•vinni2•18m ago•0 comments

Ask HN: What does Cloudflare's pay-per-crawl mean for web scrapers?

1•jjangkke•20m ago•0 comments

Neuromancer is in production

https://bsky.app/profile/greatdismal.bsky.social/post/3lswfukkn3k2z
1•SeenNotHeard•24m ago•0 comments

Amp: A text editor for your terminal

https://github.com/jmacdonald/amp
1•chaosprint•25m ago•0 comments

Ligeti – Musica ricercata No.7 – Cantabile – ARR. for theremin and analog synths [video]

https://www.youtube.com/watch?v=IRQiiPDXTGo
1•didacusc•27m ago•1 comments

[nl-ams-1] degraded performances due to abnormal temperature

https://status.scaleway.com/incidents/1vz4xfgy2gcl
3•martinald•28m ago•0 comments

Study Reveals That Internet Searches Can Hinder Creativity

https://www.cmu.edu/news/stories/archives/2025/july/study-reveals-that-internet-searches-can-hinder-creativity
2•Improvement•37m ago•0 comments

Specter of dams and diversion looms over Southeast Asia's Salween River

https://news.mongabay.com/2025/06/specter-of-dams-and-diversion-looms-over-southeast-asias-salween-river/
1•PaulHoule•37m ago•0 comments

What is automatable and who is replaceable? Thoughts from my morning commute

http://togelius.blogspot.com/2025/06/what-is-automatable-and-who-is_22.html
1•vinni2•40m ago•0 comments

Recursive factorial in 14 characters (2023)

https://mvanier.github.io/blog/posts/factorial/
1•tehnub•42m ago•0 comments

V-JEPA 2: Self-Supervised Video Models Enable Understanding,Prediction,Planning

https://github.com/facebookresearch/vjepa2
2•johlo•43m ago•0 comments

Show HN: CareerCupid now (OkCupid for Jobs) now supports job listings

2•rglullis•44m ago•0 comments

The Technical Feasibility of Divesting Google Chrome

https://kgi.georgetown.edu/research-and-commentary/technical-feasibility-of-divesting-google-chrome/
1•wmf•45m ago•0 comments

Blocking Sudo Exploits with Fapolicyd

https://www.jwgarber.ca/blog/blocking-sudo-exploits-with-fapolicyd/
2•jwgarber•46m ago•2 comments