The Surprising gRPC Client Bottleneck in Low-Latency Networks

https://blog.ydb.tech/the-surprising-grpc-client-bottleneck-in-low-latency-networks-and-how-to-get-around-it-69d6977a1d02

65•eivanov89•8h ago

Comments

yuliyp•7h ago

If you have a single TCP connection, all the data flows through that connection, ultimately serializing at least some of the processing. Given that the workers are just responding with OK, no matter how many CPU cores you give to that you're still bound by the throughput of the IO thread (well by the minimum of the client and server IO thread). If you want more than 1 IO thread to share the load, you need more than one TCP connection.

xtoilette•5h ago

classic case of head of line blocking!

yuliyp•4h ago

I don't think this is head-of-line blocking. That is, it's not like a single slow request causes starvation of other requests. The IO thread for the connection is grabbing and dispatching data to workers as fast as it can. All the requests are uniform, so it's not like one request would be bigger/harder to handle for that thread.

otterley•4h ago

> First, we checked the number of TCP connections using lsof -i TCP:2137 and found that only a single TCP connection was used regardless of in-flight count.

It's head-of-line blocking. When requests are serialized, the queue will grow as long as the time to service a request is longer than the interval between arriving requests. Queue growth is bad if sufficient capacity exists to service requests in parallel.

lacop•4h ago

Somewhat related, I'm running into a gRPC latency issue in https://github.com/grpc/grpc-go/issues/8436

If request payload exceeds certain size the response latency goes from network RTT to double that, or triple.

Definitely something wrong with either TCP or HTTP/2 windowing as it doesn't send the full request without getting ACK from server first. But none of the gRPC windowing config options nor linux tcp_wmem/rmem settings work. Sending one byte request every few hundred milliseconds fixes it by keeping the gRPC channel / TCP connection active. Nagle / slow start is disabled.

littlecranky67•4h ago

sounds like classic tcp congestion window scaling delay. Sounds like your payload exceeds 10x initcwnd.

lacop•3h ago

Doesn't initcwnd only apply as the initial value? I don't care that the first request on the gRPC channel is slow, but subsequent requests on the same channel reuse the TCP connection and should have larger window size. This works as long as the channel is actively being used, but after short inactivity (few hundred ms, unsure exactly) something appears to revert back.

littlecranky67•3h ago

Yes, in case of hot tcp connections congestion control should not be the issue.

lacop•3h ago

Yeah that was my understanding too, hence I filed the bug (actually duplicate of older bug that was closed because poster didn't provide reproduction).

Still not sure if this is linux network configuration issue or grpc issue, but something is for sure broken if I can't send a ~1MB request and get response within roughly network RTT + server processing time.

eivanov89•3h ago

That's indeed interesting, thank you for sharing.

ltbarcly3•45m ago

gRPC is a very badly implemented system. I have gotten 25%-30%+ improvements in throughput just by monkeypatching client libraries for google cloud to force json api endpoint usage.

At least try something else besides gRPC when building systems so you have a baseline performance understanding. gRPC is OFTEN introducing performance breakdowns that goes unnoticed.

stock_toaster•21m ago

Have you done any comparisons with connect-rpc?

CARA – High precision robot dog using rope

The Promised LAN

Major rule about cooking meat turns out to be wrong

Neil Armstrong's customs form for moon rocks (2016)

Parsing Protobuf like never before

A diverse cast of rocky worlds around a small star revealed by astronomers

Building better AI tools

What to expect from Debian/Trixie

Show HN: TheProtector – Linux Bash script for the paranoid admin on a budget

FastVLM: Efficient Vision Encoding for Vision Language Models

Interactive Programming in C (2014)

Checklists are hard, but still a good thing

How to increase your surface area for luck

Cops say criminals use a Google Pixel with GrapheneOS – I say that's freedom

I'm Unsatisfied with Easing Functions

Optery (YC W22) Is Hiring in Engineering, Legal, Sales, Marketing (U.S., Latam)

Show HN: The missing link of a bookstore's tech stack

You can now disable all AI features in Zed

The Big OOPs: Anatomy of a Thirty-Five Year Mistake

Kimi-K2 Tech Report [pdf]

AccuWeather to discontinue free access to Core Weather API

Vector Tiles are deployed on OpenStreetMap.org

US AI Action Plan

AI groups spend to replace low-cost 'data labellers' with high-paid experts

Why Elixir? Common misconceptions

How YouTube won the battle for TV viewers

Manticore Search: Fast, efficient, drop-in replacement for Elasticsearch

SIMD Perlin Noise: Beating the Compiler with SSE (2014)

AI overviews cause massive drop in search clicks

Reverse engineering GitHub Actions cache to make it fast