We wrote this after running into the same pattern a few times: the AI feature worked fine in development, but once real traffic hit it, the problems were mostly infra problems, not prompt problems. Provider outages, repeated token spend on identical requests, poor visibility into failures, and response shape drift.
This post is our attempt to explain the architecture pattern we ended up with, before talking about the product itself. The short version is: put a routing layer between your app and the model providers, keep the client interface the same, and move failover/cache/observability/response validation into that layer.
If you’ve built around direct OpenAI calls and had to patch retries/fallbacks/logging yourself, I’d be interested to hear what broke first for you.
vishaal_007•1h ago
We wrote this after running into the same pattern a few times: the AI feature worked fine in development, but once real traffic hit it, the problems were mostly infra problems, not prompt problems. Provider outages, repeated token spend on identical requests, poor visibility into failures, and response shape drift.
This post is our attempt to explain the architecture pattern we ended up with, before talking about the product itself. The short version is: put a routing layer between your app and the model providers, keep the client interface the same, and move failover/cache/observability/response validation into that layer.
If you’ve built around direct OpenAI calls and had to patch retries/fallbacks/logging yourself, I’d be interested to hear what broke first for you.