Show HN: Rayline routes Claude Code subagents to on-device and cheaper models

https://rayline.ai/

8•davidvgilmore•1h ago

Hi HN,

I’m one of the builders of Rayline.

Rayline is a Claude Code compatible LLM gateway. It intercepts and overrides claude code’s internal routing and lets you route subagent calls to different models instead. For example, you can run the main agent on Opus, some subagents on cloud-hosted open models, and other subagents on-device.

We’ve seen others implement routing for claude code as tools the agent can invoke. In our experience, that doesn’t work well because it requires the main agent to use tokens to think about + call the tools, and LLMs are generally a very inefficient way to make routing decisions. By implementing Rayline as a gateway, we let users deterministically configure routing decisions, and you can optionally use our ML model to make routing decisions.

We built it after noticing that Claude Code sessions contain a lot of subagent calls that don’t all need the same model. Other routers exist, but we built Rayline to let us continue using claude code (no separate harness), route tasks at a subagent level, and route across cloud and on-device. The main agent often benefits from Opus. But many delegated calls have narrow scope: search the repo, summarize context, inspect an error, poll for CI updates, etc.

The thing we’re exploring is subagent-level routing. The main cost lever in coding agents is usually cached vs non-cached input. Subagent delegations are a natural point to make routing decisions because you avoid busting cache. We look at the message-thread context for a delegated call and choose a model for that call. At a task level, Sonnet and Haiku are almost always less capability-per-dollar than open models, so the main advantage is better + (much) cheaper subagents (60-90% in our private beta).

The whole world seems to have started talking about model routing in the past two weeks, so apparently others agree it’s a relevant product area.

We’d love to get feedback from the HN community!

Comments

camomileandmilk•1h ago

Can you elaborate on this "Sonnet and Haiku are almost always less capability-per-dollar than open models"?

davidvgilmore•1h ago

Yes - in short, open models like Deepseek, Mimo, Kimi, and GLM tend to complete tasks with less tokens and cost less per token than both Sonnet and Haiku. So those models are more cost efficient, and we often think of that as them having higher "capability-per-dollar" than Sonnet or Haiku.

Much of Claude Code's internal model routing ends up delegating tasks to Sonnet or Haiku, so by intercepting those calls and using open models instead, we often see better performance at a better price.

camomileandmilk•43m ago

yeah, I get you now. but those are all Chinese hosted right? Don't think my company will enable us using them.

davidvgilmore•38m ago

Many of them are produced by Chinese labs. Some, like Neomotron, are U.S. made. And we support inference providers in both the U.S. and overseas.

If geography is important, we can restrict which geos inference takes place in. And if you don't want to use Chinese-trained models, you can use others like Mistral, Neomotron, Google's, or OpenAI's.

oypass•1h ago

How is this different from open router?

davidvgilmore

Apple Core AI Framework

The Archetypes of Liberal Womanhood Under Empire

The Economist Who Solved the Free-Rider Problem

NFCore – NFC Tag Reader Writer

Ask HN: Options for critical thinking and learning outside work?

Reeed – a read-it-later app for iOS, built after Pocket shut down

Experience using AI software to prove Euler sum results [pdf]

The FatFIRE Subreddit Is the Internet's Best Sideshow

Show HN: AST-guard – Fast, zero-cost structural checks for LLM code execution

Mental Defrag

Instead of Taking Your Job, A.I. Might Transform It

Why Isn't AI Taking Our Jobs?

Man jailed for a month despite Flock showing he was 5 miles from crime scene

watchOS 27 drops support for Apple Watch Series 6/7/8/9 and Ultra 1

Show HN: Stop returning raw JSON from MCP servers, build rich inline UIs

The Problem with Political Pearl-Clutching

Show HN: AI Pair Programmer for Emacs

Microsoft Hacked to Deliver Malware to Claude and Gemini Users

Client-side PDF audiobook reader with AI voice

LLM Are Universal Simulators

Show HN: Wallie – Open-source AI streamer that watches and hears your screen

A Dumb Harness: Fundamentals of running coding agents on a loop

Could Switzerland Become the First Country to Cap Its Population?

8 years in crypto: Etherean to Crypto Moderate

The Seven Habits That Lead to Happiness in Old Age (2022)

Building Stuff I don't Want to

Ask HN: How do you handle "what did user X do yesterday" from customer support?

The Cattle Empire That Turned Out to Be a Giant Ponzi Scheme

Sam Bankman-Fried applies for a pardon from Trump

Taleb's Turkey