One thing I’m still unclear on: in real production workloads, what ended up being the main bottleneck first — memory bandwidth, KV cache management, or scheduler overhead?
Curious how much of this showed up only under sustained load versus benchmarks.
bitkin_dev•1h ago
One thing I’m still unclear on: in real production workloads, what ended up being the main bottleneck first — memory bandwidth, KV cache management, or scheduler overhead?
Curious how much of this showed up only under sustained load versus benchmarks.