The payload is encoded using myra-codec FFM MemorySegment directly into a pre-registered buffer in io_uring SQE on the server. Similarly, on the client side CQE writes encoded payload directly into a client provided MemorySegment. The whole process saves a few SYSCALLs. Also, the above process is zero copy.
Source: https://github.com/mvp-express/myra-transport/blob/main/benc...
P.S.: I had posted this as a reply to jeffrey but not able to see it. Hence, reposting as a direct reply to the main post for visibility as well.
Disclaimer: I am the author of https://mvp.express. I would love feedback, critical suggestions/advise.
Thanks -RR
Unnecessary comments like:
clientChannel.configureBlocking(false); // Non-blocking client
can be found throughout the source, and the project's landing page is a good example of typical SOTA models' outputs when asked for a frontend landing page.The only thing that says is schemeless and is zero copy is Apache Fory which is missing from the benchmark.
jeffreygoesto•5d ago
rohanray•4d ago
Source: https://github.com/mvp-express/myra-transport/blob/main/benc...
jstimpfle•2h ago
What exactly does that roundtrip latency number measure (especially your 1us)? Does zero copy imply mapping pages between processes? Is there an async kernel component involved (like I would infer from "io_uring") or just two user space processes mapping pages?
znpy•2h ago
I did read the original linux zerocopy papers from google for example, and at the time (when using tcp) the juice was worth the squeeze when payload was larger than than 10 kilobytes (or 20? Don’t remember right now and i’m on mobile).
Also a common technique is batching, so you amortise the round-trip time (this used to be the cost of sendmmsg/recvmmsg) over, say, 10 payloads.
So yeah that number alone can mean a lot or it can mean very little.
In my experience people that are doing low latency stuff already built their own thing around msg_zerocopy, io_uring and stuff :)
blibble•1h ago