We ran side-by-side benchmarks against LiteLLM on a single t3.medium instance (using a mock LLM with 1.5s fixed latency) to test pure gateway overhead.
The Results:
p99 Latency: 90.72s (LiteLLM) vs 1.68s (Bifrost)
Throughput: 44 req/sec vs 424 req/sec
Memory: ~3x lighter usage in Go.
It’s a drop-in replacement (OpenAI compatible) designed for teams needing semantic caching, failover, and observability without the overhead.
We’d love to hear your feedback.