It includes:
1. Paged KV cache (block_size=1) + page-table semantics Trie/radix prefix cache with reference-counted KV blocks (safe prefix reuse) 2. Attention metadata builder (page_table / cu_seqlens / positions / out_loc) 3. A simple KV-capacity-bounded scheduler (admission control + continue-batching)
It’s inspired by nano-vllm and mini-sglang, but not a direct copy — I re-implemented components step-by-step to understand how the pieces fit, with help from GPT-5.2. The scheduler policy is intentionally simple (learning-first).
Performance note: with 80,000 blocks allocated, I measured ~1990 tokens/s on Llama 3.2 1B on a laptop RTX 4070.