I've also become interested in performance metrics like time to first token, inter-token latency, throughput, and wanted a tool focused on just that.
llmnop is written in Rust and was initially modeled after LLMPerf, which was archived last month. LLMPerf predates reasoning models and doesn't handle them correctly.
This release adds support for reasoning models like DeepSeek-R1, Qwen3, and gpt-oss. It now separates reasoning tokens from output tokens so your metrics actually mean something.
Previous discussion: https://news.ycombinator.com/item?id=44565477