I've built aethalloc (https://github.com/shift/aethalloc). It's a high-performance, drop-in memory allocator I wrote in Rust.
To be honest, standard allocators were absolutely choking on my NixOS router/firewall. They were hoarding memory like mad because packets get allocated on an RX thread and freed on a worker thread, basically knackering standard thread-local caches. It was also causing some serious RSS bloat on my NixOS laptop. Pure nightmare.
The Fix: O(1) Anti-Hoarding
aethalloc uses 14 thread-local size classes. When an async pipeline starts hoarding memory (like a firewall worker dropping a NIC's packet), aethalloc just punts the excess back to a global pool. It does this all at once with a single atomic Compare-And-Swap (CAS). Sorted.
┌─────────────────────────────────────────────────────────────────┐ │ Thread N Cache ──► heads[14] ──► Anti-Hoarding Threshold (4096) │ │ │ │ │ ▼ │ │ Global Pool ──► Lock-free Treiber Stack (O(1) batch push) │ └─────────────────────────────────────────────────────────────────┘
It also guarantees 16-byte alignment so your AVX/SSE stays safe, and integrates Hardware-Enforced Spatial Safety (ARM MTE, CHERI, x86_64 LAM/UAI). Pretty chuffed with how that turned out.
Usage
Just compile it to a C ABI shared library (libaethalloc.so) and chuck it into your unmodified binaries with a quick LD_PRELOAD.
I'd love to hear your thoughts on the architecture and project in general.
Cheers!