I’ve worked on and shipped a few AI systems that reached real users.
This post isn’t about models or prompts. It’s about the things that kept breaking once AI moved off the happy path: async jobs, retries, silent failures, provider outages, cost blowups, and debugging without visibility.
I wrote this mostly as a way to document the mistakes I made and what I wish I had known earlier. Happy to answer questions or dig deeper into any of the failure modes.
akarshc•2h ago
This post isn’t about models or prompts. It’s about the things that kept breaking once AI moved off the happy path: async jobs, retries, silent failures, provider outages, cost blowups, and debugging without visibility.
I wrote this mostly as a way to document the mistakes I made and what I wish I had known earlier. Happy to answer questions or dig deeper into any of the failure modes.