It's built to easily integrate in existing LLM workflows, you can use it as a proxy where the cache forwards missed requests without modification to a specified upstream, automatically updating it's cache with the response.
You can also use it as a cache-aside cache with a provided python library.
It works by computing embedding vectors of input queries, and matches them to seen query + response pairs using a vector store.
Everything is in-memory, so it should be blazing fast :)