I built a Ruby gem that caches LLM responses using semantic similarity.
If someone asks "What's the capital of France?" and later "What is France's
capital city?" — the second call hits the cache instead of the API.
How it works:
- Queries are converted to embeddings (text-embedding-3-small)
- Cosine similarity finds matches above a threshold (default 0.85)
- Cache hit = instant response, no API call, no cost
Usage is simple:
cache = SemanticCache.new
response = cache.fetch("What's the capital of France?") do
openai.chat(messages: [{ role: "user", content: "..." }])
end
# This returns the cached response — no API call
response = cache.fetch("What is France's capital city?") do
openai.chat(messages: [{ role: "user", content: "..." }])
end
Features:
- In-memory and Redis stores
- TTL expiry and tag-based invalidation
- Cost tracking with savings reports
- Works with OpenAI, Anthropic, Gemini
- Client wrapper that caches all calls automatically
- Rails integration (concern + per-user namespacing)
- Max cache size with automatic LRU eviction
In my testing, hit rates of 60-80% are typical for apps with
repetitive user queries (chatbots, search, FAQ tools).
The math: if you spend $500/mo on OpenAI and get a 70% hit rate,
that's $350/mo saved minus ~$2 in embedding costs.
stokry•1h ago
How it works: - Queries are converted to embeddings (text-embedding-3-small) - Cosine similarity finds matches above a threshold (default 0.85) - Cache hit = instant response, no API call, no cost
Usage is simple:
Features: - In-memory and Redis stores - TTL expiry and tag-based invalidation - Cost tracking with savings reports - Works with OpenAI, Anthropic, Gemini - Client wrapper that caches all calls automatically - Rails integration (concern + per-user namespacing) - Max cache size with automatic LRU evictionIn my testing, hit rates of 60-80% are typical for apps with repetitive user queries (chatbots, search, FAQ tools).
The math: if you spend $500/mo on OpenAI and get a 70% hit rate, that's $350/mo saved minus ~$2 in embedding costs.
Repo: https://github.com/stokry/semantic-cache Install: gem install semantic-cache