frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

Show HN: role-model, a router for hybrid local/cloud AI

https://github.com/try-works/role-model
2•try-working•5h ago
Hey everyone, I'm launching role-model today: a routing protocol, a reference router runtime, and an extension for Pi that allows for better informed routing decisions.

role-model is mostly deterministic, with fallback to a controller model, that routes requests based on a chosen routing strategy. the protocol is structured around assigning domains and roles to models, where requests sent by consumer applications like Pi have task types to enrich routing metadata and thereby accuracy. you can to run the built-in benchmark to compare performance of models across speed, quality and cost, as well as observed performance on real tasks. I have a diagram on how routing works in [0].

The runtime supports local models, either directly to your local endpoint (LM Studio, llama.cpp etc), or routing between multiple local models via vendored llama-swap.

Since there was another model router post yesterday where people discussed the basics of routing, I will focus on discussing some of the interesting learnings I've made building and testing this:

1. Model routing is essentially trying to predict the future: which model will perform optimally (based on criteria defined by the user) on this request?

2. After you have routed the request, you want to evaluate if it was the right decision or if some other model would have performed better

3. You also realize that having the router assess difficulty (among other things) to make decisions by itself is far from ideal - we'd prefer to have the consumer application work with the router to define what the request needs

4. You also realize that it becomes much easier, decisions become much accurate, and the outcomes of routing becomes more impactful when there is more of a distinction between models

For point 2, I will be launching evals that you can run locally to benchmark models in your pool on the same requests. The outcomes here can then be used for point 1, as input when routing new requests.

For point 3, I've built the pi-role-model package for Pi, which lets the Pi agent inject role_model.intent metadata including difficulty, preferred roles or even specfic model ids, required capabilities (say tool use or image input) and so on. You should be able to customized this further in Pi, and route in additional ways by changing metadata. This is why I've also built the role-model routing protocol.

For point 4, what model routing really does as a second order effect is create a market for specialized models - models that may or may not be smaller, could be cheaper or more expensive, may be locally runnable. It makes little sense to route between two frontier models (GPT 5.5 and Opus 4.8); it makes more sense to route between models where one of the factors of quality, speed, cost is a multiple of the other candidate models, and it makes even more sense to have specialized domain models: code, prose, math and science, visuals and so on. It is at this stage model routing becomes really valuable.

While role-model has a reference runtime that I'm continuously building out (there's lots to do to improve routing, as well as give users more granular control over routing decisions, and also ways to improve cross-model caching and also add techniques like FastContext), the ultimate goal of role-model is for there to be a standard protocol for inference requests that is used by consumer applications, so that the provider, be it a router middleware or an inference provider, will be able to route to a model that strikes the best balance between cost, speed and quality and also respects user choices, and even lets the user control these preferences to use local models for some tasks and allow cloud for others.

Links:

[0] role-model - the case for a model routing protocol: https://try.works/role-model-the-case-for-a-model-routing-pr...

[1] GitHub: https://github.com/try-works/role-model

[2] Docs: https://role-model.dev/

Comments

try-working•5h ago
For those interested, feel free to take a look at my write up on the need for a standard routing protocol [0], or point your agent on the GitHub repo and let your agent install the role-model runtime. I suggest using Pi for now; I plan to also launch an extension/plugin for OpenCode. The runtime is an alpha release so expect bugs and such. I would welcome feedback on how to make configuration more intuitive for first-time users.

I will also add that based on my current experiments, the ideal number of models in a routing pool is probably 2, following point 4 above. Each model needs to have significant differences in either quality, speed, or cost, otherwise routing decisions are hard to make and become less accurate; the benefit of routing is also less. For coding, the ideal pool in my opinion is GPT 5.4 and DeepSeek V4 Pro, to extend the GPT quota by routing some of the medium and easy requests.

[0] role-model - the case for a model routing protocol: https://try.works/role-model-the-case-for-a-model-routing-pr...

Show HN: Import the HN Home to a reading queue with clean reader view and TL;DR

https://readplace.com/import?mode=from-url
2•fagnerbrack•41m ago•1 comments

Show HN: Decomp Academy – Learn to decompile GameCube games into matching C

https://decomp-academy.dev
166•jackpriceburns•14h ago•66 comments

Show HN: A faithful MUMPS 76 anniversary implementation – the original NoSQL DB

https://github.com/rochus-keller/mumps/
3•Rochus•4h ago•1 comments

Show HN: Metaspec: The DpANS3R Common Lisp Spec in S-Expr and HTML Format

https://metaspec.dev/#
13•dlowe-net•3d ago•0 comments

Show HN: Hacker Times – HN Reader

https://times.hntrends.net/
6•bencevans•5h ago•4 comments

Show HN: Adrafinil – keep a lid-closed Mac awake only while agents work

https://github.com/kageroumado/adrafinil
114•kageroumado•19h ago•73 comments

Show HN: role-model, a router for hybrid local/cloud AI

https://github.com/try-works/role-model
2•try-working•5h ago•1 comments

Show HN: FSM – an advanced system monitor for Linux

https://github.com/mskrasnov/FSM
20•mskrasnov•14h ago•7 comments

Show HN: Clanker TV

https://botflix.tv/
3•jshaivitz•1h ago•1 comments

Show HN: Starglyphs - A constellation puzzle game based on Euler paths

https://starglyphs.com
25•telman17•18h ago•7 comments

Show HN: Hacker News on a train station-style flip board

https://popflame.quickish.space/hn-flipboard/
109•PaybackTony•1d ago•21 comments

Show HN: DBOSify – Drop-in Temporal replacement built on Postgres

https://github.com/dbos-inc/dbosify-py
87•KraftyOne•3d ago•19 comments

Show HN: Engye – transfer files between any two devices by scanning a QR code

https://engye.fuzzyworld.net/
12•psafronov•15h ago•2 comments

Show HN: Kiso, an open-source publishing engine for Open Knowledge Format

https://oak-invest.github.io/kiso/
16•straumat•18h ago•0 comments

Show HN: WebBase-III – dBASE III rebuilt in the browser with its own interpreter

https://github.com/DDecoene/WebBaseIII
95•ddecoene•4d ago•27 comments

Show HN: Smart model routing directly in Claude, Codex and Cursor

https://github.com/workweave/router
205•adchurch•1d ago•110 comments

Show HN: Foveon – Bayer to Foveon X3, learned, Mac App using deep learning

https://code.intellios.ai/photo/
3•coolwulf•11h ago•2 comments

Show HN: Autofit2 – End-to-end pipeline for multilingual text classification

https://github.com/neospe/autofit2
28•leschak•3d ago•2 comments

Show HN: KV-psi, using Linux PSI to to trim an LLM KV cache

https://github.com/infiniteregrets/kv-psi
8•infiniteregrets•17h ago•0 comments

Show HN: I made Google Trends for Hacker News by indexing 18 years of comments

https://hackernewstrends.com
802•ytkimirti•3d ago•154 comments

Show HN: Turn native language audio into flashcards and shadowing practice

https://lingochunk.com/try
91•alder•3d ago•37 comments

Show HN: Overfitted a 900KB Transformer to Compress a 100MB CSV into 7MB

111•spidy__•5d ago•67 comments

Show HN: OpenKnowledge – open source AI-first alternative to Obsidian/Notion

https://github.com/inkeep/open-knowledge
374•engomez•2d ago•170 comments

Show HN: Shopify UCP is insanely powerful

https://stack412.com/
3•westche2222•14h ago•3 comments

Show HN: QR code renderer in a TrueType font

https://qr.jim.sh/
4•foodevl•15h ago•2 comments

Show HN: Bash4LLM+ – A lightweight, dependency-free Bash wrapper for LLM APIs

https://github.com/kamaludu/bash4llm/
3•kamaludu•16h ago•1 comments

Show HN: Bible as RAG Database

https://www.crosscanon.com/
160•jacksonastone•3d ago•94 comments

Show HN: Turn images into audio that can be decoded with a spectrogram

https://nsspot.herokuapp.com/imagetoaudio/
9•jupr•3d ago•5 comments

Show HN: E3d-pod2vid – AI pipeline that turns podcasts into YouTube-ready videos

https://github.com/spacepacket1/e3d-pod2vid
4•spacepacket•17h ago•0 comments

Show HN: Nub – A Bun-like all-in-one toolkit for Node.js

https://github.com/nubjs/nub
275•colinmcd•4d ago•80 comments