Bypassing the kernel for 56ns cross-language IPC

https://github.com/riyaneel/Tachyon/tree/main/docs/adr

11•riyaneel•2d ago

Comments

riyaneel•2d ago

I am the author of this library. The goal was to reach RAM-speed communication between independent processes (C++, Rust, Python, Go, Java, Node.js) without any serialization overhead or kernel involvement on the hot path.

I managed to hit a p50 round-trip time of 56.5 ns (for 32-byte payloads) and a throughput of ~13.2M RTT/sec on a standard CPU (i7-12650H).

Here are the primary architectural choices that make this possible:

- Strict SPSC & No CAS: I went with a strict Single-Producer Single-Consumer topology. There are no compare-and-swap loops on the hot path. acquire_tx and acquire_rx are essentially just a load, a mask, and a branch using memory_order_acquire / release.

- Hardware Sympathy: Every control structure (message headers, atomic indices) is padded to 128-byte boundaries. False sharing between the producer and consumer cache lines is structurally impossible.

- Zero-Copy: The hot path is entirely in a memfd shared memory segment after an initial Unix Domain Socket handshake (SCM_RIGHTS).

- Hybrid Wait Strategy: The consumer spins for a bounded threshold using cpu_relax(), then falls back to a sleep via SYS_futex (Linux) or __ulock_wait (macOS) to prevent CPU starvation.

The core is C++23, and it exposes a C ABI to bind the other languages.

I am sharing this here for anyone building high-throughput polyglot architectures and dealing with cross-language ingestion bottlenecks.

zekrioca•1h ago

Why report p50 and not p95?

JSR_FDED•1h ago

What would need to change when the hardware changes?

BobbyTables2•31m ago

Would be interesting to see performance comparisons between this and the alternatives considered like eventfd.

Sure, the “hot path” is probably very fast for all, but what about the slow path?

Fire-Dragon-DoL•27m ago

Wow, congrats!

NIST scientists create 'any wavelength' lasers

Anonymous request-token comparisons from Opus 4.6 and Opus 4.7

Updating Gun Rocket through 10 years of Unity Engine

College instructor turns to typewriters to curb AI-written work

The electromechanical angle computer inside the B-52 bomber's star tracker

Zero-Copy GPU Inference from WebAssembly on Apple Silicon

Why Japan has such good railways

Modern Common Lisp with FSet

Optimizing Ruby Path Methods

State of Kdenlive

Migrating from DigitalOcean to Hetzner

Dizzying Spiral Staircase with Single Guardrail Once Led to Top of Eiffel Tower

Thoughts and feelings around Claude Design

NASA Shuts Off Instrument on Voyager 1 to Keep Spacecraft Operating

Metatextual Literacy

Sumida Aquarium Posts 2026 Penguin Relationship Chart, with Drama and Breakups

My first impressions on ROCm and Strix Halo

Show HN: MDV – a Markdown superset for docs, dashboards, and slides with data

Dad brains: How fatherhood rewires the male mind

Scientists discover “cleaner ants” that groom giant ants in Arizona desert

PgQue: Zero-Bloat Postgres Queue

Floating Point Fun on Cortex-M Processors

Amiga Graphics Archive

I dug into the Postgres sources to write my own WAL receiver

80386 Memory Pipeline

Understanding the FFT Algorithm (2013)

Show HN: SmallDocs – Markdown without the frustrations

UpCodes (YC S17) is hiring SDRs to help make construction more productive

Fuzix OS

Category Theory Illustrated – Orders