Old Is Gold: Optimizing Single-Threaded Applications with Exgen-Malloc

16•todsacerdoti•3mo ago

Comments

fanf2•3mo ago

At a quick skim this looks like they reinvented something very similar to phkmalloc, but they didn’t cite phkmalloc nor include it in their benchmarks.

https://phk.freebsd.dk/sagas/phkmalloc/

https://cgit.freebsd.org/src/tree/lib/libc/stdlib/malloc.c?h...

jauntywundrkind•3mo ago

It feels like there's so many weird interesting wins from abandoning SMP CPU coherency. Giving each core its own memory space & own work skips by so many gotchas & contentions.

This is nicely moving down the stack from some other nearby work. ByteDance just released code for Parker, a Linux multi-kernel approach where each core gets its own copy of Linux (and there's one coordinator core). There's another more general multi-kernel on one system approach that also has been quite active recently, that's more general (not strictly 1:1 cores kernel). https://www.phoronix.com/news/Linux-Parker-Proposal https://www.phoronix.com/news/Multi-Kernel-Linux-v2

(Obviously we can and do do lots of single thread per core work already: these emerging multi-kernel ideas are trying to push new territory, new isolation, eliminate yet more contention.)

bcrl•3mo ago

Parker is what Larry McVoy advocated for Linux back during the early days of multiprocessor scaling work. The idea was basically to treat an MP system as a cluster. Everything old is new again!

Personally, I would never agree to give up SMP CPU coherency. Multiprocessor systems are hard enough to debug with hardware cache coherency that adding in entirely new unpredictable non-deterministic behaviour would lead to more developers losing the rest of their hair prematurely. And it would likely introduce an entirely new class of security issues that nobody ever imagined that would require even worse performance draining software workarounds.

Some things are best done in hardware.

vacuity•3mo ago

See also Barrelfish for a multikernel research implementation. I think fos also qualifies.

> Personally, I would never agree to give up SMP CPU coherency. Multiprocessor systems are hard enough to debug with hardware cache coherency that adding in entirely new unpredictable non-deterministic behaviour would lead to more developers losing the rest of their hair prematurely. And it would likely introduce an entirely new class of security issues that nobody ever imagined that would require even worse performance draining software workarounds.

What are you envisioning is the alternative hardware (or is it software?), and why? I assume this is referring to some mechanism for multikernel support that doesn't rely on cache coherence. It seems like there are probably alternatives to full cache coherence that would be neutral, or better, after experience. You didn't provide substantive evidence, but on the other hand, at least multikernels on unmodified hardware seem promising.

gregw2•3mo ago

Larry (SGI) had lived through IRIX fine grained locking and even SGI's NUMA hardware cache coherency based on Stanford research right? Was his take that the complexity wasn't worth it given his experiences at SGI, or that it was just too much for an open source community to tackle without owning the hardware layers?

(And did Maddog (DEC) with a different set of experiences agree?)

vacuity•3mo ago

The trend of multicore and NUMA means that hardware increasingly looks like a traditional network of many separate computers. The natural conclusions of single-core scaled up to, say, 4 cores, shift when there are 8+ cores. Locality becomes crucial; just as you wouldn't split up data-path dependencies across LANs, you shouldn't split them up across NUMA sockets either. Ignoring arguments about locking, message passing, cache management, and whatever, the most pressing argument for multikernels (or at least, far increased per-core state and reduced shared state) is that locality is essential for performance.

layla5alive•3mo ago

Yup, data movement and contention and coherencey are the things that will increasingly dominate power use as core scaling continues. Exploiting locality is a must for high performance systems.

Linux would benefit from a scheduler per CCD (in AMD parlance) approach being a first-class option. CCD pinning is a mechanism to push in this direction today, but partitioning kernel scheduler(s) along hardware boundaries would reduce complexity and overhead for a lot of use cases..

Epstein files reveal deeper ties to scientists than previously known

Red teamers arrested conducting a penetration test

Show HN: Open-source AI powered Kubernetes IDE

Show HN: Lucid – Use LLM hallucination to generate verified software specs

AI Doesn't Write Every Framework Equally Well

Aisbf – an intelligent routing proxy for OpenAI compatible clients

Let's handle 1M requests per second

OpenClaw Partners with VirusTotal for Skill Security

Goal: Ship 1M Lines of Code Daily

Show HN: Codex-mem, 90% fewer tokens for Codex

FastLangML: FastLangML:Context‑aware lang detector for short conversational text

LineageOS 23.2

Crypto Deposit Frauds

Substack makes money from hosting Nazi newsletters

Framing an LLM as a safety researcher changes its language, not its judgement

Are there anyone interested about a creator economy startup

Show HN: Skill Lab – CLI tool for testing and quality scoring agent skills

2003: What is Google's Ultimate Goal? [video]

Roger Ebert Reviews "The Shawshank Redemption"

Busy Months in KDE Linux

Zram as Swap

Green’s Dictionary of Slang - Five hundred years of the vulgar tongue

Nvidia CEO Says AI Capital Spending Is Appropriate, Sustainable

Show HN: StyloShare – privacy-first anonymous file sharing with zero sign-up

Part 1 the Persistent Vault Issue: Your Encryption Strategy Has a Shelf Life

Show HN: Teleop_xr – Modular WebXR solution for bimanual robot teleoperation

The Highest Exam: How the Gaokao Shapes China

Open-source framework for tracking prediction accuracy

India's Sarvan AI LLM launches Indic-language focused models

Show HN: CryptoClaw – open-source AI agent with built-in wallet and DeFi skills