>This repository provides a patch for SGLang and vLLM that enables IndexCache inference acceleration for models using DeepSeek Sparse Attention (DSA), including DeepSeek-V3.2 and GLM-5.
Paper here [1].
[1] IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse:
teleforce•1h ago
Paper here [1].
[1] IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse:
https://arxiv.org/abs/2603.12201