frontpage.

Show HN: Model-literals, model-aliases, and preference-aligned routing for LLMs

https://docs.archgw.com/guides/llm_router.html

1•honorable_coder•1h ago

Today we’re shipping a major update to ArchGW (an edge and service proxy for agents [1]): a unified router that supports three strategies for directing traffic to LLMs — from explicit model names, to semantic aliases, to dynamic preference-aligned routing. Here’s how each works on its own, and how they come together.

Preference-aligned routing decouples task detection (e.g., code generation, image editing, Q&A) from LLM assignment. This approach captures the preferences developers establish when testing and evaluating LLMs on their domain-specific workflows and tasks. So, rather than relying on an automatic router trained to beat abstract benchmarks like MMLU or MT-Bench, developers can dynamically route requests to the most suitable model based on internal evaluations — and easily swap out the underlying moodel for specific actions and workflows. This is powered by our 1.5B Arch-Router LLM [2]. We also published our research on this recently[3]

Modal-aliases provide semantic, version-controlled names for models. Instead of using provider-specific model names like gpt-4o-mini or claude-3-5-sonnet-20241022 in your client you can create meaningful aliases like "fast-model" or "arch.summarize.v1". This allows you to test new models, swap out the config safely without having to do code-wide search/replace every time you want to use a new model for a very specific workflow or task.

Model-literals (nothing new) lets you specify exact provider/model combinations (e.g., openai/gpt-4o, anthropic/claude-3-5-sonnet-20241022), giving you full control and transparency over which model handles each request.

P.S. we routinely get asked why we didn't build semantic/embedding models for routing use cases or use some form of clustering technique. Clustering/embedding routers miss context, negation, and short elliptical queries, etc. An autoregressive approach conditions on the full context, letting the model reason about the task and generate an explicit label that can be used to match to an agent, task or LLM. In practice, this generalizes better to unseen or low-frequency intents and stays robust as conversations drift, without brittle thresholds or post-hoc cluster tuning.

[1] https://github.com/katanemo/archgw [2] https://huggingface.co/katanemo/Arch-Router-1.5B [2] https://arxiv.org/abs/2506.16655

As the Far Right Rises, Don't Be Ezra Klein

LLM Routing Strategies

Show HN: Parsing Crusader Kings III data files to generate mods

iOS and iPadOS 26: The MacStories Review

NCine Dev Update 22

Dutch disease: How an economic boom turns to doom

Wordoid is shutting down – favorite name generators?

UK formally recognises Palestinian state

Trump pushes Justice Department to prosecute his political opponents

Notes on QuestDB's Design: Architecture and Internals

A board member's perspective of the RubyGems controversy

Science of the Gaps

Psychedelics Blew His Mind. He Wants Other Philosophers to Open Theirs

Notorious software bug was killing people 40 years ago

U.S. State Department Introduces $1 Fee for Diversity Visa Lottery Registration

Montblanc is getting into the digital notepad game

China Road Trip Exposes List of Uninvestable Assets in the West

Show HN: AgentSafe – per-task micro-VM sandbox for AI agents (Go)

In Historic Shift, U.K., Australia and Canada Recognize a Palestinian State

Emacs GTD flow evolved (2023)

How Political Narratives are distributed on Twitter/X

Are smartphones eroding the experience of watching football?

XAI's Colossus 2 – First Gigawatt Datacenter in the World

Black Hole Merger: Clearest Evidence yet Einstein, Hawking, Kerr Were Right

Show HN: TypeScript API for India's National Stock Exchange

Who is responsible when a chatbot speaks?

Storing Unwise Amounts of Data in JavaScript Bigints

Show HN: Terminal Calculator in Pure C – Matlab in Your CLI

God-washing: Fund fined for misleading clients(2024)

WebKit Features in Safari 26.0