The problem we were solving: traditional HTTP caching doesn't work well with GraphQL. A single response often mixes data with different freshness requirements—long-lived product info, fast-changing inventory, user-specific context. Full-response caches force the shortest TTL across everything, killing hit rates.
Our approach caches at two levels: root query fields and individual entity representations. Each entity can have its own TTL (derived from Cache-Control headers or @cacheControl directives), and entities are shared across queries and users where appropriate.
A few things that might be interesting:
Tag-based invalidation: Mark entities with @cacheTag, then invalidate by tag when data changes—similar to CDN surrogate keys but at the GraphQL entity level Partial cache hits: A single query might hit cache for some subgraphs and miss for others; we built a debugger in Apollo Sandbox to inspect exactly what's cached Redis-backed with cluster support
Early results from Dow Jones: 20-25% latency reduction, 8-10x decrease in traffic to some subgraphs.
Docs: https://www.apollographql.com/docs/graphos/routing/performan... Blog: https://www.apollographql.com/blog/introducing-response-cach...