Every major LLM provider offers 50-90% discounts on cached tokens, but the mechanics to actually capture them are different for every provider, change regularly, and are genuinely hard to get right.
Genosis watches your traffic (content-blind — it only sees hashes, never your data), figures out which blocks are worth caching and in what order, and delivers a manifest that the SDK applies locally. It also catches duplicate requests and serves them from a local cache — saving both input and output tokens.
It's not a proxy. It's never in your request path. If it goes down, your app works exactly as if it was never there.
- Open-source SDK: Python (pip install genosis) and TypeScript (npm install @genosis/sdk) - Supports Anthropic and OpenAI (Google coming soon) - Free tier — you only pay if we save you money
I'm the sole founder. Happy to answer questions about how it works, the caching mechanics, or anything else.