Tracks your resource usage in real-time and adjusts how the model runs so that it works perfectly on your device.
Implements KV cache sizing, prefix caching, live RAM pressure management, context trimming, KV quantization, and more.
Built a ton of features