Hey HN , I built glide after getting tired of Claude Opus stalling mid-task under peak load.
It's a transparent proxy that tracks rolling p95 TTFT per model and cascades to a faster model before you ever hit a timeout. Three routing strategies: solo (primary healthy), hedge (race two models, stream the winner), skip (both slow → sequential cascade).
Stack: Python, FastAPI, asyncio streaming, SQLite for persistence, Prometheus metrics endpoint. Provider-agnostic — mix Anthropic, OpenAI, Gemini, Ollama in one cascade.
Happy to answer questions on the implementation, especially the mid-stream SSE abort for TTT (time-to-think) enforcement.
phanisaimuni116•1h ago
It's a transparent proxy that tracks rolling p95 TTFT per model and cascades to a faster model before you ever hit a timeout. Three routing strategies: solo (primary healthy), hedge (race two models, stream the winner), skip (both slow → sequential cascade).
Stack: Python, FastAPI, asyncio streaming, SQLite for persistence, Prometheus metrics endpoint. Provider-agnostic — mix Anthropic, OpenAI, Gemini, Ollama in one cascade.
Happy to answer questions on the implementation, especially the mid-stream SSE abort for TTT (time-to-think) enforcement.