Model literals, semantic aliases, and preference-aligned routing for LLMs

https://docs.archgw.com/guides/llm_router.html

1•honorable_coder•4mo ago

Comments

honorable_coder•4mo ago

Today we’re shipping a major update to ArchGW (an edge and service proxy for agents [1]): a unified router that supports three strategies for directing traffic to LLMs — from explicit model names, to semantic aliases, to dynamic preference-aligned routing. Here’s how each works on its own, and how they come together.

Preference-aligned routing decouples task detection (e.g., code generation, image editing, Q&A) from LLM assignment. This approach captures the preferences developers establish when testing and evaluating LLMs on their domain-specific workflows and tasks. So, rather than relying on an automatic router trained to beat abstract benchmarks like MMLU or MT-Bench, developers can dynamically route requests to the most suitable model based on internal evaluations — and easily swap out the underlying moodel for specific actions and workflows. This is powered by our 1.5B Arch-Router LLM [2]. We also published our research on this recently[3]

Modal-aliases provide semantic, version-controlled names for models. Instead of using provider-specific model names like gpt-4o-mini or claude-3-5-sonnet-20241022 in your client you can create meaningful aliases like "fast-model" or "arch.summarize.v1". This allows you to test new models, swap out the config safely without having to do code-wide search/replace every time you want to use a new model for a very specific workflow or task.

Model-literals (nothing new) lets you specify exact provider/model combinations (e.g., openai/gpt-4o, anthropic/claude-3-5-sonnet-20241022), giving you full control and transparency over which model handles each request.

P.S. we routinely get asked why we didn't build semantic/embedding models for routing use cases or use some form of clustering technique. Clustering/embedding routers miss context, negation, and short elliptical queries, etc. An autoregressive approach conditions on the full context, letting the model reason about the task and generate an explicit label that can be used to match to an agent, task or LLM. In practice, this generalizes better to unseen or low-frequency intents and stays robust as conversations drift, without brittle thresholds or post-hoc cluster tuning.

[1] https://github.com/katanemo/archgw [2] https://huggingface.co/katanemo/Arch-Router-1.5B [2] https://arxiv.org/abs/2506.16655

Show HN: I Built a Free AI LinkedIn Carousel Generator

Implementing Auto Tiling with Just 5 Tiles

Open Challange (Get all Universities involved

Apple Tried to Tamper Proof AirTag 2 Speakers – I Broke It [video]

Show HN: Vibe as a Code / VaaC – new approach to vibe coding

Show HN: More beautiful and usable Hacker News

Toledo Derailment Rescue [video]

War Department Cuts Ties with Harvard University

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

A Bid-Based NFT Advertising Grid

AI readability score for your documentation

NASA Study: Non-Biologic Processes Don't Explain Mars Organics

I inhaled traffic fumes to find out where air pollution goes in my body

X said it would give $1M to a user who had previously shared racist posts

155M US land parcel boundaries

Private Inference

Font Rendering from First Principles

Show HN: Seedance 2.0 AI video generator for creators and ecommerce

Wally: A fun, reliable voice assistant in the shape of a penguin

Rewriting Pycparser with the Help of an LLM

Lobsters Vibecoding Challenge

E-Commerce vs. Social Commerce

Avoiding Modern C++ – Anton Mikhailov [video]

Show HN: AegisMind–AI system with 12 brain regions modeled on human neuroscience

Zig – Package Management Workflow Enhancements

AI-powered text correction for macOS

AppSecMaster – Learn Application Security with hands on challenges

Fibonacci Number Certificates

AI Overviews are killing the web search, and there's nothing we can do about it

City skylines need an upgrade in the face of climate stress