We need OpenRunPods to run thick open weights models.
Build in the cloud rather than bet on "at the edge" being a Renaissance.
Consider Haiku 4.5: $1/M input tokens | $5/M output tokens vs MiniMax M2.7: $0.30/M input tokens | $1.20/M output tokens vs Kimi K2.5: $0.45/M input tokens | $2.20/M output tokens
I haven't tried so I can't say for sure, but from personal experience, I think M2.7 and K2.5 can match Haiku and probably exceed it on most tasks, for much cheaper.
https://web-support-claw.oncanine.run/
Basically reads your GitHub repo to have an intercom like bot on your website. Answer questions to visitors so you don’t have to write knowledge bases.
j0rg3•1h ago
Tiered inference: Haiku 4.5 for conversation (sub-second, cheap), Sonnet 4.6 for tool use (only when needed). Hard cap at $2/day.
A2A passthrough: the private-side agent borrows the gateway's own inference pipeline, so there's one API key and one billing relationship regardless of who initiated the request.
You can talk to nully at https://georgelarson.me/chat/ or connect with any IRC client to irc.georgelarson.me:6697 (TLS), channel #lobby.
jgrizou•1h ago
sbinnee•1h ago
One question. Sonnet for tool use? I am just guessing here that you may have a lot of MCPs to call and for that Sonnet is more reliable. How many MCPs are you running and what kinds?