Rdma, dpdk, io_uring it’s really kind of up to the user to do the memory isolation
In io_urings case tho, you can’t do much because the rings are in the kernel.
I’m hopeful though that with Llm things will get better.
But it’s just hard problem to solve . Very difficult to do in the kernel itself, and folks don’t really even understand tuning for it.
I took a (very brief) look at the github repo [1], it doesn't look like you're doing anything with cpu pinning.
You can probably eke (thanks) out a bit more performance if you cpu pin your threads and cpu pin your listen sockets (sockopt SO_INCOMING_CPU).
If you also cpu align your outgoing sockets, you should get a significant boost, but afaik, there's no great api for that. Linux does have an api for compatible NICs (traffic steering/flow steering) which can work, but if you know what hash your NIC uses (it's probably toeplitz) and you manage source port selection to your backend, you can pick ports that will hash properly.
The goal is for your proxy to be able to handle packets without any cross cpu communication.
https://access.redhat.com/solutions/4723221
Go should reconsider support. They should have a 'go' at it.
mrlonglong•2h ago
MathMonkeyMan•2h ago
vlovich123•1h ago
saghm•1h ago