Key results:
Mean tokens per second: ~114.5 Mean time to first token: 0.74 s
Under batch load, P99 tokens per second reached ~134.8.
The full benchmark report, raw statistics, and methodology are available here: https://github.com/geoddllc/large-llm-inference-benchmarks/b...
Support for larger models (400B class) is planned for next week. If you want to try it yourself, you can deploy via the console https://console.geodd.io/
malith•1h ago
Key results:
Mean tokens per second: ~114.5 Mean time to first token: 0.74 s
Under batch load, P99 tokens per second reached ~134.8.
The full benchmark report, raw statistics, and methodology are available here: https://github.com/geoddllc/large-llm-inference-benchmarks/b...
Support for larger models (400B class) is planned for next week. If you want to try it yourself, you can deploy via the console https://console.geodd.io/