It's head-of-line blocking. When requests are serialized, the queue will grow as long as the time to service a request is longer than the interval between arriving requests. Queue growth is bad if sufficient capacity exists to service requests in parallel.
If request payload exceeds certain size the response latency goes from network RTT to double that, or triple.
Definitely something wrong with either TCP or HTTP/2 windowing as it doesn't send the full request without getting ACK from server first. But none of the gRPC windowing config options nor linux tcp_wmem/rmem settings work. Sending one byte request every few hundred milliseconds fixes it by keeping the gRPC channel / TCP connection active. Nagle / slow start is disabled.
Still not sure if this is linux network configuration issue or grpc issue, but something is for sure broken if I can't send a ~1MB request and get response within roughly network RTT + server processing time.
I wonder if I should change this to 0 on my default desktop machines for all connections.
Now latency is just RTT + server time + payloadsize/bandwidth, not multiple times RTT: https://github.com/grpc/grpc-go/issues/8436#issuecomment-311...
I was not aware of this setting, it's pretty unfortunate this is a system-level setting that can't be overridden on application layer, and the idle timeout can't be changed either. Will have to figure out how to safely make this change on the k8s service this is affecting...
An application won't know anything about background specifics of the network to which the system on which it is running is attached. A system administrator might. In that sense at least, it is reasonable that it is a system tunable rather than a per-connection setsockopt().
At least try something else besides gRPC when building systems so you have a baseline performance understanding. gRPC is OFTEN introducing performance breakdowns that goes unnoticed.
yuliyp•6mo ago
atombender•6mo ago