I’ve had various experiments/lightweight projects that make occasional calls to various providers, and just wanted a very simple and configurable way to automatically triage the models for slightly longer running tasks.
Cascade is a super simple, single dependency-free Python script that turns free-tier AI API keys into one always-on chat endpoint.
It takes any combination of free provider keys (Groq, Cerebras, Gemini, Mistral, OpenRouter, Cloudflare, SambaNova, Nvidia NIM, etc.) and Cascade automatically:
- Discovers available models - Ranks them best → worst - Routes prompts to the best available model - Detects rate limits, quota exhaustion, outages, and unsupported models - Fails over to the next-best model automatically
It works as both:
- An interactive CLI chat client - An OpenAI-compatible REST API (`/v1/chat/completions`)
Supports ~18 OpenAI-compatible providers today, including a keyless OVHcloud fallback. Run `/providers` to see connected providers and add more keys at any time.
I’m sure there’s solutions out there (incl. some of the providers listed), this isn’t a product; just a very simple solve for a very simple issue, sharing for those like myself looking for a more CLI-focused or configurable approach!