Show HN: Threadprocs – executables sharing one address space (0-copy pointers)

31•jer-irl•1h ago

This project launches multiple independent programs into a single shared virtual address space, while still behaving like separate processes (independent binaries, globals, and lifetimes). When threadprocs share their address space, pointers are valid across them with no code changes for well-behaved Linux binaries.

Unlike threads, each threadproc is a standalone and semi-isolated process. Unlike dlopen-based plugin systems, threadprocs run traditional executables with a `main()` function. Unlike POSIX processes, pointers remain valid across threadprocs because they share the same address space.

This means that idiomatic pointer-based data structures like `std::string` or `std::unordered_map` can be passed between threadprocs and accessed directly (with the usual data race considerations).

This accomplishes a programming model somewhere between pthreads and multi-process shared memory IPC.

The implementation relies on directing ASLR and virtual address layout at load time and implementing a user-space analogue of `exec()`, as well as careful manipulation of threadproc file descriptors, signals, etc. It is implemented entirely in unprivileged user space code: <https://github.com/jer-irl/threadprocs/blob/main/docs/02-imp...>.

There is a simple demo demonstrating “cross-threadproc” memory dereferencing at <https://github.com/jer-irl/threadprocs/tree/main?tab=readme-...>, including a high-level diagram.

This is relevant to systems of multiple processes with shared memory (often ring buffers or flat tables). These designs often require serialization or copying, and tend away from idiomatic C++ or Rust data structures. Pointer-based data structures cannot be passed directly.

There are significant limitations and edge cases, and it’s not clear this is a practical model, but the project explores a way to relax traditional process memory boundaries while still structuring a system as independently launched components.

Comments

tombert•1h ago

Interesting.

I gotta admit that my smell test tells me that this is a step backwards; at least naively (I haven't looked through the code thoroughly yet), this just kind of feels like we're going back to pre-protected-memory operating systems like AmigaOS; there are reasons that we have the process boundary and the overhead associated with it.

If there are zero copies being shared across threadprocs, what's to stop Threadproc B from screwing up Threadproc A's memory? And if the answer to this is "nothing", then what does this buy over a vanilla pthread?

I'm not trying to come off as negative, I'm actually curious about the answer to this.

furyofantares•1h ago

I think it is possible for B to screw up A when you share a pointer but otherwise unlikely. B doing stuff with B's memory is unlikely to screw up A.

tombert•56m ago

Sure but that's true of threads as well. The advantage of having these threadprocs is that there can be zero-copy sharing, which isn't necessarily bad but if they aren't copied then B could screw up A's stuff.

If you're ok with threads keeping their own memory and not sharing then pthreads already do that competently without any additional library. The problem with threads is that there's a shared address space and so thread B can screw up thread A's memory and juggling concurrency is hard and you need mutexes. Processes give isolation but at the cost of some overhead and IPC generally requiring copying.

I'm just not sure what this actually provides over vanilla pthreads. If I'm in charge of ensuring that the threadprocs don't screw with each other then I'm not sure this buys me anything.

jer-irl•1h ago

Not negative at all, thanks for commenting. You're right that the answer is "nothing," and that this is a major trade-off inherent in the model. From a safety perspective, you'd need to be extremely confident that all threadprocs are well-behaved, though a memory-safe language would help some. The benefit is that you get process-like composition as separate binaries launched at separate times, with thread-like single-address space IPC.

After building this, I don't think this is necessarily a desirable trade-off, and decades of OS development certainly suggest process-isolated memory is desirable. I see this more as an experiment to see how bending those boundaries works in modern environments, rather than a practical way forward.

tombert•48m ago

It's certainly an interesting idea and I'm not wholly opposed to it, though I certainly wouldn't use it as a default process scheduler for an OS (not that you were suggesting that). I would be very concerned about security stuff. If there's no process boundary saying that threadproc A can't grab threadproc B's memory, there could pretty easily be unintended leakages of potentially sensitive data.

Still, I actually do think there could be an advantage to this if you know you can trust the executables, and if the executables don't share any memory; if you know that you're not sharing memory, and you're only grabbing memory with malloc and the like, then there is an argument to be made that there's no need for the extra process overhead.

lstodd•57m ago

Judging by the description, it's exactly like AmigaOS, Windows 3.x or early Apple. Or MS-DOS things like DESQview.

I fail to see the point - if you control the code and need performance so much that an occassional copy bites, you can as well just link it all into a single address space without those hoopjumps. It won't function as separate processes if it's modified to rely on passing pointers around anyway.

And if you don't, good luck chasing memory corruption issues.

Besides, what's wrong with shared memory?

whalesalad•1h ago

I feel like this could unlock some huge performance gains with Python. If you want to truly "defeat the GIL" you must use processes instead of threads. This could be a middle ground.

short_sells_poo•58m ago

How would this really help python though? This doesn't solve the difficult problem, which is that python objects don't support parallel access by multiple threads/processes, no? Concurrent threads, yes, but only one thread can be operating on a python object at a time (I'm simplifying here for brevity).

There are already means of passing around bulk data with zero copy characteristics in python, but there's a lot of bureaucracy around it. A true solution must work with the GIL (or remove it altogether), no?

jer-irl•53m ago

I'm not familiar with CPython GC internals, but I there there are mechanisms for Python objects to be safely handed to C,C++ libraries and used there in parallel? Perhaps one could implement a handoff mechanism that uses those same mechanisms? Interesting idea!

hun3•43m ago

This is exactly what subinterpreters are for! Basically isolated copies of Python in the same process.

https://docs.python.org/3/library/concurrent.interpreters.ht...

If you want a higher-level interface, there is InterpreterPoolExecutor:

https://docs.python.org/3/library/concurrent.futures.html#co...

L8D•1h ago

It is lovely to see experimentation like this going on. I think this has a lot of potential with something like Haskell's green threading model, basically taking it up a notch and doing threading across pthreads instead of being restricted to the VM threads. Surely, if this can be well fleshed-out, it could be implemented at the compiler-level or library-level so existing multi-threaded Haskell software can switch over to something like this to squeeze out even more performance. I'm not an expert here, though, so ¯\_(ツ)_/¯ take my words with a grain of salt.

fwsgonzo•1h ago

I actually just published a paper about something like this, which I implemented in both libriscv and TinyKVM called "Inter-Process Remote Execution (IPRE): Low-latency IPC/RPC using merged address spaces".

Here is the abstract: This paper introduces Inter-Process Remote Execution (IPRE), whose primary function is enabling gated persistence for per-request isolation architectures with microsecond-latency access to persistent services. IPRE eliminates scheduler dependency for descheduled processes by allowing a virtual machine to directly and safely call, execute functions in a remote virtual machines address space. Unlike prior approaches requiring hardware modifications (dIPC) or kernel changes (XPC), IPRE works with standard virtualization primitives, making it immediately deployable on commodity systems. We present two implementations: libriscv (12-14ns overhead, emulated execution) and TinyKVM (2-4us overhead, native execution). Both eliminate data serialization through address-space merging. Under realistic scheduler contention from schbench workloads (50-100% CPU utilization), IPRE maintains stable tail latency (p99<5us), while a state-of-the-art lock-free IPC framework shows 1,463× p99 degradation (4.1us to 6ms) when all CPU cores are saturated. IPRE thus enables architectural patterns (per-request isolation, fine-grained microservices) that incur millisecond-scale tail latency in busy multi-tenant systems using traditional IPC.

Bottom line: If you're doing synchronous calls to a remote party, IPRE wouldn't require any scheduler mediation. The same applies to your repo. Passing allocator-less structures to the remote is probably a landmine waiting to happen. If you structure both parties to use custom allocators, at least for the remote calls, you can track and even steal allocations (using a shared memory area). With IPRE there is extra risk of stale pointers because the remote part is removed from the callers memory after it completes. The paper will explain all the details, but for example since we control the VMM we can close the remote session if anything bad happens. (This paper is not out yet, but it should be very soon)

The best part about this kind of architecture, which you immediately mention, is the ability to completely avoid serialization. Passing a complex struct by reference and being able to use the data as-is is a big benefit. It breaks down when you try to do this with something like Deno, unfortunately. But you could do Deno <-> C++, for example.

For libriscv the implementation is simpler: Just loan remote-looking pages temporarily so that read/write/execute works, and then let exception-handling handle abnormal disconnection. With libriscv it's also possible for the host to take over the guests global heap allocator, which makes it possible to free something that was remotely allocated. You can divide the address space into the number of possible callers, and one or more remotes, then if you give the remote a std::string larger than SSO, the address will reveal the source and the source tracks its own allocations, so we know if something didn't go right. Note that this is only an interest for me, as even though (for example) libriscv is used in large codebases, the remote RPC feature is not used at all, and hasn't been attemped. It's a Cool Idea that kinda works out, but not ready for something high stakes.

jeffbee•58m ago

Looks like you forgot the URL. Interested.

fwsgonzo•49m ago

Best I can do is reply to an e-mail if someone asks for the paper, since it's not out yet. The e-mail ends with hotmail.

kjok•17m ago

> I actually just published a paper...

This gives me an impression that the paper has already been published and is available publicly for us to read.

fwsgonzo•9m ago

Sorry about that, the conference was on Feb 2, and it's supposed to be out any day/week now. I don't have a date.

philipwhiuk•33m ago

This is basically 'De-ASLR' is it not?

jer-irl•29m ago

Could you clarify what you mean by that? This does heavily rely on loaded code being position-independent, because the memory used will go into whatever regions `mmap(..., ~MAP_FIXED)` returns.

PaulDavisThe1st•29m ago

Are you familiar with the Opal OS from UW CS&E during the 90s ? All tasks in a single system-wide address space, with h/w access control for memory regions.

> The Opal project is exploring a new operating system structure, tuned to the needs of complex applications, such as CAD/CAM, where a number of cooperating programs manipulate a large shared persistent database of objects. In Opal, all code and data exists with in a single, huge, shared address space. The single address space enhances sharing and cooperation, because addresses have a unique (for all time) interpretation. Thus, pointer-based data structures can be directly communicated and shared between programs at any time, and can be stored directly on secondary storage without the need for translation. This structure is simplified by the availability of a large address space, such as those provided by the DEC Alpha, MIPS, HP/PA-RISC, IBM RS6000, and future Intel processors.

> Protection in Opal is independent of the single address space; each Opal thread executes within a protection domain that defines which virtual pages it has the right to access. The rights to access a page can be easily transmitted from one process to another. The result is a much more flexible protection structure, permitting different (and dynamically changing) protection options depending on the trust relationship between cooperating parties. We believe that this organization can improve both the structure and performance of complex, cooperating applications.

> An Opal prototype has been built for the DEC Alpha platform on top of the Mach operating system.

https://homes.cs.washington.edu/~levy/opal/opal.html

jer-irl•18m ago

I wasn't but I'll have to read more! Some good relevant discussion here too https://news.ycombinator.com/item?id=7554921 . I wanted to keep this project in user-space, but there's a lot of interesting ground on the OS side of things too. Something like Theseus <https://github.com/theseus-os/Theseus> is also interesting, providing similar protections (in theory) by enforced compiler invariants rather than hardware features.

lifis•26m ago

"Your scientists were so preoccupied with whether or not they could, they didn't stop to think if they should"

It's very cool, but would only be useful in some marginal cases, specifically if you don't want to modify the programs significantly and the reliability reduction is worth either the limited performance upside of avoiding mm switches or the ability to do somewhat easier shared memory.

Generally this problem would be better solved in either of these ways: 1. Recompile the modules as shared libraries (or statically link them together) and run them with a custom host program. This has less memory waste and faster startup. 2. Have processes that share memory via explicit shared memory mechanisms. This is more reliable.

jer-irl•12m ago

Thanks! The idea of launching additional components nearly "natively" from the shell was compelling to me early on, but I agree that shared libraries with a more opinionated "host program" is probably a more practical approach.

Explicit shared memory regions is definitely the standard for this sort of a problem if you desire isolated address spaces. One area I want to explore further is using allocation/allocators that are aware of explicit shared memory regions, and perhaps ensuring that the regions get mmap'd to the same virtual address in all participants.

kgeist•11m ago

From what I remember, in the Linux kernel, there's already barely any distinction between processes and threads, a thread is just a process that shares virtual memory with another process, you specify if memory should be shared or not when calling clone()

So we already have threads that do exactly what you're trying to do? Isn't it somewhat easier and less risky to just compile several programs into one binary? If you have no control over the programs you're trying to "fuse" (no source), then you probably don't want to fuse them, because it's very unsafe.

Maybe I don't understand something. I think it can work if you want processes with different lib versions or even different languages, but it sounds somewhat risky to pass data just like that (possible data corruption)

otterley•7m ago

The code uses Linux's clone3() syscall under the hood: see https://github.com/jer-irl/threadprocs/blob/main/docs/02-imp...

The interesting thing is that it's loading existing binaries and executing them in a different way than usual. I think it's pretty clever.

wswin•8m ago

Cool project. I think most real life problems would be solved with shared memory objects, c.f. shm_open [1].

Python has a wrapper on in the standard library [2], not sure about other languages

1. https://www.man7.org/linux/man-pages/man3/shm_open.3.html

2. https://docs.python.org/3/library/multiprocessing.shared_mem...

US Labor Force Demographics Are Worsening

Show HN: Tok/s on a 35B MoE model using a $100 AMD crypto APU and Vulkan

Runway GWM-1

My prodigal brainchild: Neal Stephenson on the Metaverse

Schema as the Core of Reliability in AI Memory

$100 gift card could be legit. $1000 is a Scam. What should scammers do?

Mapping ICE's expanding footprint, and the communities fighting back

Show HN: Daily Unfold – A paper folding puzzle game

Fantasy Baseball 2026

Dark Shadows Fall, One Upon the Other

A Conversation with Claude

Show HN: Run Labs – A multi-product startup

The Priesthood of System Design

Outworked – An Open Source Office UI for Claude Code Agents

Trigrep – indexed regex search in Rust with CLI

Harper's Policy on PRs Authored by Agents

Show HN: SRD, a simple DNS-driven HTTP redirect service

Coke fires worker injured on the job, keeping him would be hard on the company

Show HN: Littlebird – Screenreading is the missing link in AI

PostGIS-compatible spatial functions for ClickHouse

Pipe to Bash

Prediction Market VC Fund Backed by Kalshi and Polymarket Founders

Assignable Research bot for Linear and Slack tasks

US govt pays TotalEnergies nearly $1B to stop US offshore wind projects

Make OpenClaw your company's training program for agentic AI

LLM-Native Advertising (What Ads in GenAI Will Look Like)

Selling Stuff

Microsoft Copilot Is Confronting Its Identity Crisis

What Does a Hologram Trademark Signify When the Hologram Isn't There?

Microsoft's "Fix" for Windows 11: Flowers After the Beating