My AI bill was getting out of control from user queries and RAG workflows, a third of them generated the same output. I built Kento to fix it. It's a semantic cache that you add by just changing the base_url parameter in your OpenAI, Anthropic, or Google client. It catches duplicates (exact or semantic), serves instant responses, and gives you a dashboard to track hit rates and costs.
Faster for your users, cheaper for you.
We have a free dev tier and would love your feedback.
andreysheva•1h ago
We have a free dev tier and would love your feedback.