The following sample (probably) does the same thing and is almost half as short. I have not tested it because there is no signup (EDIT: I was mistaken, there actually is a "signup" behind the login link, which is Google or GitHub login, so the naming makes sense. I confused it with a previously more prominent waitlist link.)
import requests
# Your Hypermode Workspace API key
api_key = "<YOUR_HYP_WKS_KEY>"
# Use the Hypermode Model Router API endpoint
url = f"https://models.hypermode.host/v1/chat/completions"
headers = {"Authorization": f"Bearer {api_key}"}
payload = {
"model": "meta-llama/llama-4-scout-17b-16e-instruct",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is Dgraph?"},
],
"max_tokens": 150,
"temperature": 0.7,
}
# Make the API request
with requests.post(url, headers=headers, json=payload) as response:
response.raise_for_status()
print(response.json()["choices"][0]["message"]["content"])
There's a waitlist for our prompt to agent product in the banner. That's a good call to update it to be more clear.
jbellis•3h ago
But OpenRouter is ridiculously popular so it must be very useful for other use cases!
johnymontana•3h ago
Also, being able to use models from multiple services and open source models without signing up for another service / bring your own API key is a big accelerator for folks getting started with Hypermode agents.
iamtherhino•3h ago
Agreed on swapping models for code-gen doesn't make sense. We're mostly indexed on GPT-4.1 for our AgentBuilder product. I haven't found it easy to move between models for code super effective.
The most popular use case we've seen from folks is on the iteration/experimentation phase of building an agent/tool. We made ModelRouter originally as an internal service for our "prompt to agent" product, where folks are trying a few dozen models/MCPs/tools/data/etc really quickly as they try to find a local maximum for some automation or job.
0xDEAFBEAD•3h ago
(This would be more for using models at scale in production as opposed to individual use for code authoring etc.)
jbellis•3h ago
Feels a bit halting-problem-ish: can you tell if a problem is too hard for model A without being smarter than model A yourself?
0xDEAFBEAD•2h ago
Basically compare model performance on a bunch of problems, and see if the queries which actually require an expensive model have anything in common (e.g. low Flesch-Kincaid readability, or a bag-of-words approach which tries to detect the frequency of subordinate clauses/potentially ambiguous pronouns, or word rarity, or whatever).
Maybe my knowledge of old-school NLP methods is useful after all :-) Generally those methods tend to be far less compute-intensive. If you wanted to go really crazy on performance, you might even use a Bloom filter to do fast, imprecise counting of words of various types.
Then you could add some old-school, compute-lite ML, like an ordinary linear regression on the old-school-NLP-derived features.
Really the win would be for a company like Hypermode to implement this automatically for customers who want it (high volume customers who don't mind saving money).
Actually, a company like Hypermode might be uniquely well-positioned to offer this service to smaller customers as well, if query difficulty heuristics generalize well across different workloads. Assuming they have access to data for a large variety of customers, they could look for heuristics that generalize well.
iamtherhino•2h ago
I think there's a big advantage to be had for folks brining "old school" ML approaches to LLMs. We've been spending a lot of time looking at the expert systems from the 90s.
Another one we've been looking at is applying some query planning approaches to these systems to see if we can pull responses from cache instead of invoking the model again.
Obviously there's a lot of complexity to identifying where we could apply some smaller ML models or cache-- but it's been a really fun exploration.
0xDEAFBEAD•2h ago
No way. I would definitely be curious to hear more if you want to share.
iamtherhino•3h ago
What we've seen most successful is making recommendations in the agent creation process for a given tool/workload and then leaving them somewhat static after creation.
0xDEAFBEAD•3h ago
iamtherhino•3h ago
0xDEAFBEAD•2h ago