We’re building https://www.switchpoint.dev – a drop-in replacement for OpenAI’s API that reduces LLM cost by smartly routing across models (e.g., Claude, Gemini, GPT-4) depending on subject and difficulty of the task.
Why we built this: LLM costs are spiraling—especially for products doing retrieval, agentic reasoning, or even just high-volume chat. We were frustrated with paying GPT-4 rates when most queries didn’t need it. So we built a router that:
- Starts with cheaper/free models (like Llama 8B, 4o-mini, 2.0 flash) - Streams responses and upgrades on failure - Acts like a single OpenAI-compatible endpoint For enterprise, you can define fallback logic and some custom routing logic as well. It’s plug-and-play.
Designed for agents and RAG systems: We’re exploring integrations with open-source coding agents and autonomous frameworks. If you maintain one or use one, we’d love to collaborate, or even just get your thoughts on whether this would help your stack.
We’d love any and all feedback—features, bugs, edge cases, your use cases. Would this be useful to you? Is it solving a real problem?
Thanks for checking it out!