Caching at the HTTP request level has the obvious problem of generalizability. Nearly no request is identical, due to templated variables (like names) and metadata (like timestamps), so exact-match cache lookups rarely hit. We solve this at Butter by using LLMs to detect dynamic content in requests and derive their inter-relationships, allowing the cache entry to be stored as a template + variables + deterministic code. This allows future requests to contain different variable data, yet still serve from cache.
We've found this approach greatly improves cache hit rate, and believe it could be useful for agents performing repetitive back-office tasks, computer use, or data transformations where input data is frequently of the same shape.
- You can see a demo of learning patterns here: https://www.youtube.com/watch?v=ORDfPnk9rCA
- We wrote more about the technical approach here: https://blog.butter.dev/on-automatic-template-induction-for-...
- It's free to try out here: https://butter.dev/auth
zekejohn•21h ago