It's head-of-line blocking. When requests are serialized, the queue will grow as long as the time to service a request is longer than the interval between arriving requests. Queue growth is bad if sufficient capacity exists to service requests in parallel.
If request payload exceeds certain size the response latency goes from network RTT to double that, or triple.
Definitely something wrong with either TCP or HTTP/2 windowing as it doesn't send the full request without getting ACK from server first. But none of the gRPC windowing config options nor linux tcp_wmem/rmem settings work. Sending one byte request every few hundred milliseconds fixes it by keeping the gRPC channel / TCP connection active. Nagle / slow start is disabled.
Still not sure if this is linux network configuration issue or grpc issue, but something is for sure broken if I can't send a ~1MB request and get response within roughly network RTT + server processing time.
At least try something else besides gRPC when building systems so you have a baseline performance understanding. gRPC is OFTEN introducing performance breakdowns that goes unnoticed.
yuliyp•7h ago