I've been using LLMPerf for a while to evaluate the performance of our inference servers (vLLM, SGLang, etc.).
It works great, but I was running into memory constraints while testing large number of concurrent users on some servers, and didn't always find the specific Python version requirements convenient.
So, I rewrote the benchmark aspect of this tool in Rust to get an easy single-line install.
I hope its useful to others as well, and would love to hear feedback if you have any suggestions for improvement.
jpreagan•2h ago
It works great, but I was running into memory constraints while testing large number of concurrent users on some servers, and didn't always find the specific Python version requirements convenient.
So, I rewrote the benchmark aspect of this tool in Rust to get an easy single-line install.
I hope its useful to others as well, and would love to hear feedback if you have any suggestions for improvement.