I'm launching cascadeflow – an open-source tool for AI model cascading that can reduce your AI provider costs by 30-65% with just 3 lines of code.
The core insight: After a year of working with small language models and domain-specific models (especially on edge devices), I found that 80% of queries can be handled by cheaper, smaller models. Only the complex 20% actually need flagship models.
How it works:
1. Route queries to a cheap "drafter" model first
2. Validate the response quality
3. If quality passes, return it (fast + cheap)
4. If not, escalate to an expensive "verifier" model
We're seeing 40-85% cost savings in production workflows, with 70-80% of queries never touching the expensive model.
Available for Python and TypeScript, with integrations for n8n and LiteLLM. MIT licensed.
This is Day 2 of our release sprint. Would love to hear your feedback, especially if you're dealing with high AI API costs or running models on resource-constrained environments.
saschabuehrle•2h ago
I'm launching cascadeflow – an open-source tool for AI model cascading that can reduce your AI provider costs by 30-65% with just 3 lines of code.
The core insight: After a year of working with small language models and domain-specific models (especially on edge devices), I found that 80% of queries can be handled by cheaper, smaller models. Only the complex 20% actually need flagship models.
How it works: 1. Route queries to a cheap "drafter" model first 2. Validate the response quality 3. If quality passes, return it (fast + cheap) 4. If not, escalate to an expensive "verifier" model
We're seeing 40-85% cost savings in production workflows, with 70-80% of queries never touching the expensive model.
Available for Python and TypeScript, with integrations for n8n and LiteLLM. MIT licensed.
GitHub: https://github.com/lemony-ai/cascadeflow
This is Day 2 of our release sprint. Would love to hear your feedback, especially if you're dealing with high AI API costs or running models on resource-constrained environments.