llmtop is a real-time terminal dashboard for LLM inference workers. It scrapes the Prometheus /metrics endpoints that vLLM, SGLang, and LMCache already expose and shows everything in one view: KV cache usage, queue depth, TTFT/ITL latencies (P50/P99 from histogram buckets), token throughput, prefix cache hit rates. Color-coded — red means go fix it.
``` brew install InfraWhisperer/tap/llmtop Or go install github.com/InfraWhisperer/llmtop/cmd/llmtop@latest. ```
Single binary, no Prometheus server needed, no Grafana, no config. Just run llmtop and it auto-discovers local workers.
Written in Go with Bubbletea. Working on Kubernetes pod auto-discovery and a GPU metrics view next.