Java FFM zero-copy transport using io_uring

https://www.mvp.express/

100•mands•2mo ago

Comments

jeffreygoesto•2mo ago

27us roundtrip is not really state of the art for zero copy IPC, about 1us would be. What is causing this overhead?

rohanray•2mo ago

It's not a local IPC exactly. The roundtrip benchmark stat is for a TCP server-client ping/pong call using a 2 KB payload; TCP is although on local loopback (127.0.0.1).

Source: https://github.com/mvp-express/myra-transport/blob/main/benc...

jstimpfle•1mo ago

Asking for those who, like me, haven't yet taken the time to find technical information on that webpage:

What exactly does that roundtrip latency number measure (especially your 1us)? Does zero copy imply mapping pages between processes? Is there an async kernel component involved (like I would infer from "io_uring") or just two user space processes mapping pages?

foltik•1mo ago

27us and 1us are both an eternity and definitely not SOTA for IPC. The fastest possible way to do IPC is with a shared memory resident SPSC queue.

The actual (one-way cross-core) latency on modern CPUs varies by quite a lot [0], but a good rule of thumb is 100ns + 0.1ns per byte.

This measures the time for core A to write one or more cache lines to a shared memory region, and core B to read them. The latency is determined by the time it takes for the cache coherence protocol to transfer the cache lines between cores, which shows up as a number of L3 cache misses.

Interestingly, at the hardware level, in-process vs inter-process is irrelevant. What matters is the physical location of the cores which are communicating. This repo has some great visualizations and latency numbers for many different CPUs, as well as a benchmark you can run yourself:

[0] https://github.com/nviennot/core-to-core-latency

jstimpfle•1mo ago

I was really asking what "IPC" means in this context. If you can just share a mapping, yes it's going to be quite fast. If you need to wait for approval to come back, it's going to take more time. If you can't share a memory segment, even more time.

foltik•1mo ago

No idea what this vibe code is doing, but two processes on the same machine can always share a mapping, though maybe your PL of choice is incapable. There aren’t many libraries that make it easy either. If it’s not two processes on the same machine I wouldn’t really call it IPC.

Of course a round trip will take more time, but it’s not meaningfully different from two one-way transfers. You can just multiply the numbers I said by two. Generally it’s better to organize a system as a pipeline if you can though, rather than ping ponging cache lines back and forth doing a bunch of RPC.

znpy•1mo ago

It may or may not be good, depending on a number of fact.

I did read the original linux zerocopy papers from google for example, and at the time (when using tcp) the juice was worth the squeeze when payload was larger than than 10 kilobytes (or 20? Don’t remember right now and i’m on mobile).

Also a common technique is batching, so you amortise the round-trip time (this used to be the cost of sendmmsg/recvmmsg) over, say, 10 payloads.

So yeah that number alone can mean a lot or it can mean very little.

In my experience people that are doing low latency stuff already built their own thing around msg_zerocopy, io_uring and stuff :)

hinkley•1mo ago

io_uring is a tool for maximizing throughput not minimizing latency. So the correct measure is transactions per millisecond not milliseconds per transaction.

Little’s Law applies when the task monopolizes the time of the worker. When it is alternating between IO and compute, it can be off by a factor of two or more. And when it’s only considering IO, things get more muddled still.

znpy•1mo ago

> io_uring is a tool for maximizing throughput not minimizing latency.

some features are explicitly designed to minimize latency. I'm thinking of the IORING_SETUP_IOPOLL and IORING_SETUP_SQPOLL flags for io_uring_setup .

I'm not making that up, the manpage says that: https://manpages.debian.org/unstable/liburing-dev/io_uring_s...

blibble•1mo ago

indeed, you can get a packet from one box to another in 1-2us

steeve•1mo ago

with io_uring? How? I tried everything in the book

rohanray•1mo ago

It's not a local IPC exactly. The roundtrip benchmark stat is for a TCP server-client ping/pong call using a 2 KB payload; TCP is although on local loopback (127.0.0.1).

The payload is encoded using myra-codec FFM MemorySegment directly into a pre-registered buffer in io_uring SQE on the server. Similarly, on the client side CQE writes encoded payload directly into a client provided MemorySegment. The whole process saves a few SYSCALLs. Also, the above process is zero copy.

Source: https://github.com/mvp-express/myra-transport/blob/main/benc...

P.S.: I had posted this as a reply to jeffrey but not able to see it. Hence, reposting as a direct reply to the main post for visibility as well.

Disclaimer: I am the author of https://mvp.express. I would love feedback, critical suggestions/advise.

Thanks -RR

refulgentis•1mo ago

Pretty much what NateB said* - but that might leave you at "what's wrong with that? that's how I could get it done"

There's WAY too much content, way too many names and stuff that feels subtly off. I'm 37, been on this site for 16 years. I'm assuming target audience here is enterprise Java developers, which isn't my home, so I'm sure I'm missing some stuff is idiomatic in that culture.

But the vast, vast amount of things that are completely unfamiliar tells me something else is going on and it's not good.

Like I bet this is f'ing cool, otherwise you wouldn't put in the effort to share it. But you're better off having something super brief** in a GitHub README than a pseudo-marketing site that's straining to fit a cool technical thing into the wrong template.

* https://news.ycombinator.com/item?id=46255661

** what you wrote is great! "The payload is encoded using myra-codec FFM MemorySegment directly into a pre-registered buffer in io_uring SQE on the server. Similarly, on the client side CQE writes encoded payload directly into a client provided MemorySegment. The whole process saves a few SYSCALLs. Also, the above process is zero copy." -- then the site looks like it wants to sell N different products and confusing flowcharts, but really, you're just geeked out and did something cool and want to share the technical details. So it's designed for the wrong audience.

owl_might•1mo ago

Do you vibecoded this entire thing ? That's clearly the impression it gives. I haven't seen a single line of text or code in this entire organization that looks human.

Do you have the skills to verify what the AI has generated, and are you confident that everything works as advertised?

rohanray•1mo ago

I just wrote in details on the same in a reply to nateb2022. https://news.ycombinator.com/item?id=46257205

quietbritishjim•1mo ago

I don't see a reply to nateb2022 by you.

rohanray•1mo ago

heres the link for the same

https://news.ycombinator.com/item?id=46257205

ThrowawayR2•1mo ago

Turn on showdead. Newly created accounts are more likely to have their first few postings [dead] because of rampant abuse.

nateb2022•1mo ago

This looks like most of it was vibecoded.

Unnecessary comments like:

  clientChannel.configureBlocking(false); // Non-blocking client

can be found throughout the source, and the project's landing page is a good example of typical SOTA models' outputs when asked for a frontend landing page.

szundi•1mo ago

What really matters though is the quality of the human review.

krisgenre•1mo ago

Okay, but is that a bad thing?

sgammon•1mo ago

If the author doesn't understand their own code, I probably won't

another_twist•1mo ago

Vibe coding doesnt mean the author doesnt understand their code. Its likely that they don't want carpal tunnel from typing out trivial code and hence offload that labor to a machine.

sgammon•1mo ago

"Vibe-coding" means the author deliberately does not understand their code. "AI-assisted engineering" is what you are thinking of.

rat9988•1mo ago

Well, it is the original poster that used vibe-coding as "AI-assisted engineering".

quietbritishjim•1mo ago

JNI for io_uring is not trivial code.

noitpmeder•1mo ago

For your pet project? No. For something you're building for others to use? Almost certainly yes.

falcojr•1mo ago

You do realize that it's possible to ask AI to write code and then read the code yourself to ensure it's valid, right? I usually try to strip the pointless comments, but it's not the end of the world if people leave them in.

rustman123•1mo ago

The comments aren’t the problem.

sgammon•1mo ago

> I usually try to strip the pointless comments

You could add your own instead, explaining how things work?

> It's possible to ask AI to write code and then read the code yourself

Sure, but then it would not be vibecoding.

cbsmith•1mo ago

>> It's possible to ask AI to write code and then read the code yourself

> Sure, but then it would not be vibecoding.

Wait, what?

sgammon•1mo ago

AI assisted coding/engineering becomes "vibe coding" when you decide to abdicate any understanding of what you are building, instead focusing only on the outcome

cbsmith•1mo ago

This feels like a silly semantics argument, but how is the outcome not what you are building?

the_af•1mo ago

Vibe-coding as originally defined (by Karpathy?) implied not reading the code at all, just trying it and pasting back any error codes; repeat ad infinitum until it works or you give up.

Now the term has evolved into "using AI in coding" (usually with a hint of non rigor/casualness), but that's not what it originally meant.

simlevesque•1mo ago

Yeah but you're leaving out a crucial part: the code is full of useless comments.

That leaves 2 options:

- they didn't read the code themselves to ensure it's valid

- they did read the code themselves but left the useless comments

No matter which happened it shows they're a bad developer and I don't want to run their code.

Mechanical9•1mo ago

IMO reading code is usually harder than writing code.

UltraSane•1mo ago

Is it bad if the world runs on a quadrillion lines of AI-generated code that no one really understands?

acedTrex•1mo ago

yes, it is

Squarex•1mo ago

They openly talk about it here https://www.mvp.express/philosophy/ in section " AI-Assisted Development".

koakuma-chan•1mo ago

AI openly talks about it.

Squarex•1mo ago

yeah, for sure, the docs screams AI too

rohanray•1mo ago

Apologies! It's a long read and was the only time I did not want to use AI to summarize for a purpose :) ---- So Yes, a lot of the code has been written using AI. I have also been transparent about it on https://www.mvp.express/philosophy/ in the section "AI-Assisted Development".

However, that does not mean that I as the author do not understand the code/concepts :) I also don't deny the fact that I might not have gone through the entire codebase till now.

For some background: 1. I have been working in Capital Markets-Trading, basically FIX (https://www.fixtrading.org/what-is-fix/) systems since a few years now and have been using QuickFIX/J at my job. 2. At the same time, I have been intrigued with Java FFM especially after seeing huge performance gain over idiomatic Java code for a (~500 MB market data) file processing job which I had written a few months back at my regular work. 3. Fellow FIX developers from JVM world would know that there are other Java FIX systems that achieve "extra/huge performance boost" by using "Java's sun.misc.Unsafe" in several parts of the FIX system and OMS.

Reflecting on above 3 points, I had envisioned writing a modern Java FIX engine with 1. 0.0% usage of sun.misc.Unsafe in the entire codebase, 2. achieve close(-enough) performance to market leading C/C++ FIX engines. This was somewhat in the beginning of this year-2025. However, a month or two into this effort I realized 2 key essential ingredients which will dictate performance, latency, & throughput of the entire system - 1. Serialization & 2. Transport. By then, I had already written quite a few tests and benchmarks and was amazed by the performance boost solely relying on FFM; also no unsafe, zero copy, zero allocations are the benefits as byproduct & ofcourse extremely low GC pauses comparatively.

Since I had already started using FFM MemorySegment et al to build the key infra parts of the system - I was of the opinion that restricting these only within a FIX system alone would be a crime. Hence, MYRA & MVP.Express were incubated as an idea overnight - modern, safe, lightweight, modular, FFM oriented high-performant Java infra libs.

Well, I have been posting only on Reddit's Java sub till now to get some initial feedback. However, I just noticed today a sudden huge inflow of traffic and that's how I realized its coming from a post by mands on HN. Thanks mands! I had no intention of posting to HN (yet). No complaints :) I'm glad it made here and also appreciate all the feedback.

A note on why the extensive (un)checked usage of AI to build this - I would like to go breadth first rapidly i.e. expanding the ecosystem to let others tinker with. Work in pipeline - 1. JIA Cache - build a modern JVM based off-heap & safe distributed caching using the MYRA libs as infra. 2. MVP.Express - a light-weight Java only RPC system focusing on performance, type-safety, schema-driven, high throughput & low latency by leveraging MYRA libs & JIA-Cache as building blocks. Side note: I am currently on a vacation. Once back; I plan to start integrating XDP/eBPF as another backend for myra-transport.

Agree or not - That's a hell lot of work! And that's the reason I am using AI extensively. To quickly build modern FFM based solutions and validate the existence/purpose - through performance and other metrics. Ideally, they should be real good candidates to perhaps displace incumbent similar systems which have a lot of legacy pre Java 8 code; meaning even if such existing systems need to be modernized they would potentially have to be re-written from scratch using modern Java paradigms. Well that's what MYRA & MVP.Express is trying to do now as Stage 0 at a rapid pace - to see a market fit!

Having said that, I am very cautious about the design and guard-rails which is evident from the extensive test suite & benchmark every MYRA lib has and will have. Trying to follow a close TTD loop here.

Next stages: If the MYRA libs and related ecosystem seems to be a good fit for modern Java projects, then I and others (its OSS for a purpose) can contribute also by manually reading (human verification) certain parts of the code in which they are experts at. This way we/us as a Java community can build modern forward-looking libs & solutions to power the enterprises for the next decade or two. It may sound silly but I believe in this philosophy and I hope you will too!

Let's look at it from another (realistic) perspective - I have been working on this since a few months (2 to 3 give or take) along with my current 9-5; have been possible only due to AI. TBH if there was no AI, most probably I would not even have thought of starting this myriad task - since I know practically I would never have been able to finish ever or would have taken an enormous timeline and perhaps might have abandoned it half-way!

Hope, this clears some air and brings some honest clarity about the goals & philosophy of MYRA (& myself seconded). Also, I am not a io_uring/XDP expert and AI has been really helpful to bring my vision into reality. Although, I am in parallel trying to grow my knowledge into the technical nitty-gritties of these tools/technologies. However solely due to AI, I was able to rapidly build something and hence, prove that using io_uring has substantial benefit - evident from benchmarks against Java Netty. That's what I meant earlier by rapidly expanding on the breadth of the ecosystem first and warranting every solution's purpose thru benchmarks and other metrics; not to forget NO unsafe and NO JNI as well are also golden nuggets.

Last but not the least, I am excited by the response here on HN and will stay close here going forward; will be sharing updates here as well. I would also appreciate all kind of concerns/feedback/suggestion.

Thanks -RR

lossolo•1mo ago

> However, that does not mean that I as the author do not understand the code/concepts :) I also don't deny the fact that I might not have gone through the entire codebase till now.

You didn't go through the codebase, but you understand the code? What?

sk4is3r•1mo ago

Once a codebase reaches a certain size and level of complexity—and is clearly no longer a toy project—it becomes difficult, and eventually impossible, to keep every implementation detail in your head. At that point, this also starts to drift away from what you should actually care about: invariants, and ensuring they are not broken or gradually eroded over time. From my point of view, this is much harder to achieve than one might expect, especially without a highly skilled team to help reason about and discuss these concerns.

Focusing on concepts and invariants, and being able to reconstruct the core ideas of the system from them, is therefore the right priority.

Keep grinding, and good luck with your project.

rohanray•1mo ago

Thank you for the kind words! I completely agree with your thoughts.

I have been part of projects with millions of LOC; team members including leads keep changing. This means after a short and certain amount of time, any project will be handled by members who are completely new to the codebase and not a single member exists who have gone through the entire code. Does that mean the devs and the projects are now bad? NO! You just continue working at a higher level i.e. at the project's invariants, concepts, guidelines, guardrails, scope etc. You dig into the code when there are issues or working on enhancements.

mands•1mo ago

Hi @rohanray - original submitter.

Apologies for submitting before the project was perhaps ready for a wider audience. I'm a boring Spring developer these days but enjoy reading about wider JVM developments and thought this was cool and worth sharing.

I think it's a shame that the HN audience's AI debates tend to derail more interesting technical conversations.

From my perspective, there are many valid reasons to use AI to bring a project to fruition, including speed, project scope, time constraints, etc. If AI helps bring to life projects that would otherwise remain daydreams, especially open-source ones, that's a win imo.

It's also far-fetched to assume someone clearly an engineer, building a project of such scope, working with low-level Java, integrating io_uring, and generating tests to ensure contracts, is a "vibe-coder" in the pejorative sense. It's unfortunate that some HN users resort to name-calling and gatekeeping, and I think some newer community rules and guidelines for AI-enhanced submissions would be helpful.

Anyway, keep coding and releasing!

rohanray•1mo ago

Thanks @mands for you reply! I am glad you did post it and really thanks for it. However, I echo your thoughts too - I was expecting a constructive engagement re FFM & io-uring, etc. but unfortunately the entire thread digressed into AI & vibe coding etc.

I am definitely continuing to work on this.

TheGuyWhoCodes•1mo ago

In my opinion adding kryo in the benchmark is somewhat disingenuous as it does not require a message schema definition while MyraCodec/SBE/FlatBuffers do.

The only thing that says is schemeless and is zero copy is Apache Fory which is missing from the benchmark.

rohanray•1mo ago

I had added Kryo since that seems to be the fastest Java serialization library which does not use sun.misc.unsafe.

Thanks for sharing Apache Fory! Will try to add that to the benchmark as well.

DarkmSparks•1mo ago

Most of it seems to be 404ing now

rohanray•1mo ago

Oh! That shouldn't be the case :( Please let me know if you are still facing 404. I just checked and no alerts from my monitoring yet.

Thanks for letting know though!

DarkmSparks•1mo ago

E.g.

https://www.mvp.express/docs/v0.1.0/examples/kvstore

Under examples

exabrial•1mo ago

Impressive. I'm sure the numbers will continue to improve as both the FFM and this project mature.

Java Native databases or KVP stores would be good usage targets IMHO

rohanray•1mo ago

I have been planning on JIA Cache as a distributed caching system built with off heap memory DS & Flyweight for readers to achieve zero copy. I think a KV store will come out as a byproduct while developing JIA Cache

We Mourn Our Craft

Speed up responses with fast mode

Hoot: Scheme on WebAssembly

Stories from 25 Years of Software Development

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

U.S. Jobs Disappear at Fastest January Pace Since Great Recession

Al Lowe on model trains, funny deaths and working with Disney

Brookhaven Lab's RHIC Concludes 25-Year Run with Final Collisions

The AI boom is causing shortages everywhere else

The Waymo World Model

Reinforcement Learning from Human Feedback

Start all of your commands with a comma (2009)

I Write Games in C (yes, C)

SectorC: A C Compiler in 512 bytes

Vocal Guide – belt sing without killing yourself

France's homegrown open source online office suite

Coding agents have replaced every framework I used

A Fresh Look at IBM 3270 Information Display System

Selection Rather Than Prediction

History and Timeline of the Proco Rat Pedal (2021)

72M Points of Interest

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Where did all the starships go?

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Learning from context is harder than we thought

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Monty: A minimal, secure Python interpreter written in Rust for use by AI

Making geo joins faster with H3 indexes

Software factories and the agentic moment

We Mourn Our Craft

Speed up responses with fast mode

Hoot: Scheme on WebAssembly

Stories from 25 Years of Software Development

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

U.S. Jobs Disappear at Fastest January Pace Since Great Recession

Al Lowe on model trains, funny deaths and working with Disney

Brookhaven Lab's RHIC Concludes 25-Year Run with Final Collisions

The AI boom is causing shortages everywhere else

The Waymo World Model

Reinforcement Learning from Human Feedback

Start all of your commands with a comma (2009)

I Write Games in C (yes, C)

SectorC: A C Compiler in 512 bytes

Vocal Guide – belt sing without killing yourself

France's homegrown open source online office suite

Coding agents have replaced every framework I used

A Fresh Look at IBM 3270 Information Display System

Selection Rather Than Prediction

History and Timeline of the Proco Rat Pedal (2021)

72M Points of Interest

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Where did all the starships go?

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Learning from context is harder than we thought

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Monty: A minimal, secure Python interpreter written in Rust for use by AI

Making geo joins faster with H3 indexes

Software factories and the agentic moment

Java FFM zero-copy transport using io_uring

Comments