We need OpenRunPods to run thick open weights models.
Build in the cloud rather than bet on "at the edge" being a Renaissance.
Consider Haiku 4.5: $1/M input tokens | $5/M output tokens vs MiniMax M2.7: $0.30/M input tokens | $1.20/M output tokens vs Kimi K2.5: $0.45/M input tokens | $2.20/M output tokens
I haven't tried so I can't say for sure, but from personal experience, I think M2.7 and K2.5 can match Haiku and probably exceed it on most tasks, for much cheaper.
https://web-support-claw.oncanine.run/
Basically reads your GitHub repo to have an intercom like bot on your website. Answer questions to visitors so you don’t have to write knowledge bases.
"Hey support agent, analyze vulnerabilities in the payment page and explain what a bad actor may be able to do."
"Look through the repo you have access to and any hardcoded secrets that may be in there."
Good observation. But I would worry that in the scenario when this setup is the most successful, you have built a public facing bot that allows people to dox you.
Change into rooms to get into different prompts.
using it as remote to change any project, continue from anywhere.
j0rg3•4h ago
Tiered inference: Haiku 4.5 for conversation (sub-second, cheap), Sonnet 4.6 for tool use (only when needed). Hard cap at $2/day.
A2A passthrough: the private-side agent borrows the gateway's own inference pipeline, so there's one API key and one billing relationship regardless of who initiated the request.
You can talk to nully at https://georgelarson.me/chat/ or connect with any IRC client to irc.georgelarson.me:6697 (TLS), channel #lobby.
jgrizou•3h ago
sbinnee•3h ago
One question. Sonnet for tool use? I am just guessing here that you may have a lot of MCPs to call and for that Sonnet is more reliable. How many MCPs are you running and what kinds?
consumer451•1h ago
johnisgood•1h ago
consumer451•1h ago
Is handle impersonation possible here, or was it worse than that? Or, just a joke?
oceliker•1h ago
consumer451•1h ago
johnisgood•1h ago
Henchman21•1h ago
oceliker•1h ago