Booting 5000 Erlangs on Ampere One 192-core

https://underjord.io/booting-5000-erlangs-on-ampere-one.html

225•ingve•6mo ago

Comments

elteto•6mo ago

  “ Underjord is an artisanal consultancy …”

If they don’t weave Erlang threads by hand I’m going to be mildly disappointed.

bevr1337•6mo ago

All process messages written in beautiful calligraphy.

hinkley•6mo ago

All constants are haiku.

thechao•6mo ago

Hand computed in the finest morning rays by monks in the Dolomites.

temp0826•6mo ago

Single origin, farm-to-bytecode processes with our signature rustic garbage collection and heirloom fault tolerance...

antonvs•6mo ago

> heirloom fault tolerance...

In other words, nepobaby fault tolerance

lawik•5mo ago

The advice in every consultation is either mouth-spoken or finger-written by a 100% organic (but not currently certified) entity.

lifeisstillgood•6mo ago

So this is something like a 5000 USD machine (https://www.jeffgeerling.com/blog/2024/ampereone-cores-are-n...) And is designed as a cloud provider or telco edge machine (hence the erlang consultancy)

But if you are looking at a hosted erlang VM for a capex of one dollar then these folks are onto something

Cores really are the only way to escape the broken moores law - and this does look like a real step in the important direction. Less LLMs more tiny cores

sargun•6mo ago

I really like the manycores approach, but we haven’t seen it come to fruition — at least not on general purpose machines. I think a machine that exposes each subset of cores as a NUMA node and doesn’t try to flatten memory across the entire set of cores might be a much more workable approach. Otherwise the interconnect becomes the scaling limit quickly (all cores being able to access all memory at speed).

Erlang, at least the programming model, lends itself well to this, where each process has a local heap. If that can stay resident to a subsection of the CPU, that might lend itself better to a reasonably priced many core architecture.

zozbot234•6mo ago

> Erlang, at least the programming model, lends itself well to this, where each process has a local heap.

That loosely describes plenty of multithreaded workloads, perhaps even most of them. A thread that doesn't keep its memory writes "local" to itself as much as possible will run into heavy contention with other threads and performance will suffer a lot. It's usual to try and write multithreaded workloads in a way that tries to minimize the chance of contention, even though this may not involve a literal "one local heap per core".

bglusman•6mo ago

Yes, but in Erlang, everything on every process is immutable and nothing is ever trying to write anywhere besides locally. Every variable assignment leaves the previous memory unchanged and fully accessible to anything directly referencing it.

felixgallo•6mo ago

Paraphrasing the late great Joe Armstrong, the great thing about Erlang as opposed to just about any other language is that every year the same program gets twice as fast as last year.

Manycores hasn't succeeded because frankly the programming model of essentially every other language is stuck in 1950. I, the program, am the entire and sole thing running on this computer, and must manually manage resources to match its capabilities. Hence async/await, mutable memory, race checkers, function coloring, all that nonsense. If half the effort spent straining to get the ghost PDP-11 ruling all the programming languages had been spent on cleaning up the (several) warts in the actor model and its few implementations, we'd all be driving Waymos on Jupiter by now.

ncgl•6mo ago

Can you explain the joe armstrong quote a bit to someone not familiar with the language?

sam-cop-vimes•6mo ago

Erlang's runtime system, the BEAM, automatically takes care of scheduling the execution of lightweight erlang processes across many cpus/cores. So a well written Erlang program can be sped up almost linearly by adding more cpus/cores. And since we are seeing more and more cores being crammed into cpus each year, what Joe meant is that by deploying your code on the latest cpu, you've doubled the performance without touching your code.

RossBencina•6mo ago

I'm curious, which actor model warts are you referring to exactly?

[The obvious candidates from my point of view are (1) it's an abstract mathematical model with dispersed application/implementations, most of which introduce additional constraints (in other words, there is no central theory of the actor model implementation space), and (2) the message transport semantics are fixed: the model assumes eventual out-of-order delivery of an unbounded stream of messages. I think they should have enumerated the space of transport capabilities including ordered/unordered, reliable/unreliable within the core model. Treatment of bounded queuing in the core model would also be nice, but you can model that as an unreliable intermediate actor that drops messages or implements a backpressure handshake when the queue is full.]

felixgallo•6mo ago

I don't think either of those are particularly problematic. The actor model as implemented by Erlang is concrete and robust enough. The big problems with the actor model are, in my opinion, around (1) speed optimizations for immutable memory and message passing (currently, there's a great deal of copying and pointer chasing involved, which can be slow and is a ripe area for optimization), (2) (for Erlang) speed and QOL improvements for math and strings (Erlang historically is not about fast math or string handling, but both of those do comprise a great deal of general purpose programming), (3) (for Erlang) operational QOL misc improvements (e.g. existing distribution, ets, amnesia, failover, hot upgrade, node deployment, build process range from arcane (amnesia, hot upgrades, etc.all the way up to covered-in-terrifying-spiders (e.g. debugging queuing issues, rebar3))

RossBencina•5mo ago

There is no lineage between The Actor Model and Erlang. The creators of Erlang are on record as having never heard of the Actor Model (as developed by Hewitt, Agha and colleagues at MIT). None of the points you make (including the first one) are a part of any formal definition or elaboration of the Actor Model that I have seen, which was one of my points: there is no unified theory of the Actor Model that addresses all of the practical issues.

With respect to your point (1), you might be interested in Pony, which has been discussed here from time to time, most recently: https://news.ycombinator.com/item?id=44719413 Of course there are other actor-based systems in wide use such as Akka.

toast0•6mo ago

> think a machine that exposes each subset of cores as a NUMA node and doesn’t try to flatten memory across the entire set of cores might be a much more workable approach. Otherwise the interconnect becomes the scaling limit quickly (all cores being able to access all memory at speed).

Epyc has a mode where it does 4 numa nodes per socket, IIRC. It seems like that should be good if your software is NUMA aware or NUMA friendly.

But most of the desktop class hardware has all the cores sharing a single memory controller anyway, so if you had separate NUMA nodes, it wouldn't reflect reality.

Reducing cross core communication (NUMA or not) is the key to getting high performance parallelism. Erlang helps because any cross process communication is explicit, so there's no hidden communication as can sometimes happen in languages with shared memory between threads. (Yes, ets is shared, but it's also explicit communication in my book)

leoc•6mo ago

Who knows what will really happen, but there have been rumours of significant core-count bumps in Ryzen 6, which would edge the mainstream significantly closer to manycore.

to11mtm•6mo ago

> Erlang, at least the programming model, lends itself well to this, where each process has a local heap. If that can stay resident to a subsection of the CPU, that might lend itself better to a reasonably priced many core architecture.

I tend to agree.

Where it gets -really- interesting to think about, are concepts like 'core parking' actors of a given type on specific cores; e.x. 'somebusinessprocess' actor code all happens on a specific fixed set of cores and 'account' actors run on a different fixed set of cores, versus having all the cores going back and forth between both.

Could theoretically get a benefit due to instruction cache being very consistent per core, giving benefits due to the mechanical sympathy (I think Disruptors also take advantage of this).

On the other hand, it may not be as big a benefit, in the sense that cross process writes are cross core writes and those tend to lead to their own issues...

fun to think about.

LtdJorge•6mo ago

The BEAM launches a scheduler process per CPU thread in SMP mode, although I don't know if it moves Erlang processes between them.

oxidant•6mo ago

The behavior is configurable and the default is unbound.

https://www.erlang.org/doc/apps/erts/erl_cmd.html#%2Bsbt

hinkley•6mo ago

Azul did something like this back in the ‘10s for Java. But it’s one of those products for when you’ve put all you eggs in one basket and you need the biggest basket money can buy. Sort of like early battery backed storage. T was only fit for WAL writing on mission critical databases because one cost more than a car.

alberth•6mo ago

While not this exact server, from Hetzner, you can get an 80-core Ampere for just ~$200 per month.

(And that also includes hosting, egress, power, etc).

https://www.hetzner.com/dedicated-rootserver/rx170/

znpy•6mo ago

> Product currently not available

in practice you can't though

bravesoul2•6mo ago

Is that cheaper? 7200 over 3 years. Obviously more convenient though and less capex.

alberth•6mo ago

Don’t forgot the cost of …

> “(And that also includes hosting, egress, power, etc).

bravesoul2•6mo ago

Yes indeed. I feel like probably both are a similar price so its not a financial decision (unless you just dont have 5k) as much as do you need intense control (buy the server) or do you prefer less hassle (have them host it).

ethan_smith•6mo ago

The article is about 5000 Erlang nodes (BEAM VMs), not processes - a single BEAM instance can efficiently handle millions of lightweight processes, making this even more impressive from a density perspective.

lawik•5mo ago

I really should see how many 1 million process BEAM VMs can fit...

slashdave•6mo ago

You mean, with something like "multiprocessing"?

lawik•5mo ago

I found out that Ampere is into edge and telco usage way after we got connected to do this work actually. I've been an Elixir dev and through that connected to Erlang for 7-ish years.

But I will certainly try to leverage my telco-connection to get to play with more of their kit if I can.

Animats•6mo ago

"5000 Erlangs" - oh, they meant 5000 instances of some Erlang interpreter. Not Erlang as a unit of measure.[1] One voice call for one hour is one Erlang.

[1] https://en.wikipedia.org/wiki/Erlang_(unit)

bravesoul2•6mo ago

Thanks for the rabbit hole!

robocat•6mo ago

What does 5000 Animats measure?

Does 1 Animat convert to metric nitpicks?

You know you're successful once you're added to: https://www.theregister.com/Design/page/reg-standards-conver...

jacquesm•6mo ago

https://en.wikipedia.org/wiki/Nagle%27s_algorithm

QuantumNomad_•6mo ago

Neat! I always thought the name of the Erlang programming language just meant “Ericsson Language”, since this programming language was invented for Ericsson. Never knew there was anything more than that to the name!

cmrdporcupine•6mo ago

I believe it's both.

RossBencina•6mo ago

I believe it's neither:

"The origin of queueing theory dates back to 1909, when Agner Krarup Erlang (1878–1929) published his fundamental paper on congestion in telephone traffic [for a brief account, see Saaty (1957), and for details on his life and work, see Brockmeyer et al. (1948)]." -- https://www.sciencedirect.com/topics/engineering/queueing-th...

0x69420•6mo ago

also the namesake of the unit fwiw

Animats•6mo ago

In the early days of telephony, system load was measured by how much current was being drawn from the talk power supply. This was done with a watt-hour meter, calibrated in erlangs.[1]

(It's amazing how little logging went on in the phone system before computerized switching. But that's another subject.)

[1] https://physicsmuseum.uq.edu.au/erlangmeter

RF_Savage•6mo ago

And it was a pun by Ericsson engineers, as they used Erlang to program telephone switches where the capacity planing included Erlangs.

lawik•5mo ago

According to Robert Virding at an unnamed bar in Berlin ~3 years ago they just wanted to be like Pascal in terms of picking a mathematician. But Ericsson Language certainly helped sell it internally, I'm sure.

lawik•5mo ago

I was aware an Erlang being a unit though I'd forgotten what it measured. I Need to have my fun when giving titles to these things. Hope it fell within bearable tolerances.

kirito1337•6mo ago

Wow man.

ThinkBeat•6mo ago

I would be much more interesting in seeing 5000 under heavy load.

Just being able to star that many instances is not that exciting until we know what they can do.

bglusman•6mo ago

Erlang handles heavy load VERY well, between work stealing schedulers and soft realtime via reduction counting (any program can be interrupted and stopped after any instruction and resumed transparently)

ThinkBeat•6mo ago

That is good. Then actual propper benchmarks will be remarkable.

However BEAM is not the only factor in this process. the entire hardware platform as well.

This is after all a lot about that nice and huge cpu.

I mean when you have all 5000 started why not let the do some work? Stress test it with a few real life scenarios for 48h and let us see some number.

lawik•5mo ago

I will consider it :)

Lightweight and extensible compatibility layer between dataframe libraries

Haskell for all: Beyond agentic coding

Dorsey's Block cutting up to 10% of staff

Show HN: Freenet Lives – Real-Time Decentralized Apps at Scale [video]

In the AI age, 'slow and steady' doesn't win

Administration won't let student deported to Honduras return

How were the NIST ECDSA curve parameters generated? (2023)

AI, networks and Mechanical Turks (2025)

Goto Considered Awesome [video]

Show HN: I Built a Free AI LinkedIn Carousel Generator

Implementing Auto Tiling with Just 5 Tiles

Open Challange (Get all Universities involved

Apple Tried to Tamper Proof AirTag 2 Speakers – I Broke It [video]

Show HN: Isolating AI-generated code from human code | Vibe as a Code

Show HN: More beautiful and usable Hacker News

Toledo Derailment Rescue [video]

War Department Cuts Ties with Harvard University

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

A Bid-Based NFT Advertising Grid

AI readability score for your documentation

NASA Study: Non-Biologic Processes Don't Explain Mars Organics

I inhaled traffic fumes to find out where air pollution goes in my body

X said it would give $1M to a user who had previously shared racist posts

155M US land parcel boundaries

Private Inference

Font Rendering from First Principles

Show HN: Seedance 2.0 AI video generator for creators and ecommerce

Wally: A fun, reliable voice assistant in the shape of a penguin

Rewriting Pycparser with the Help of an LLM

Lobsters Vibecoding Challenge

Lightweight and extensible compatibility layer between dataframe libraries

Haskell for all: Beyond agentic coding

Dorsey's Block cutting up to 10% of staff

Show HN: Freenet Lives – Real-Time Decentralized Apps at Scale [video]

In the AI age, 'slow and steady' doesn't win

Administration won't let student deported to Honduras return

How were the NIST ECDSA curve parameters generated? (2023)

AI, networks and Mechanical Turks (2025)

Goto Considered Awesome [video]

Show HN: I Built a Free AI LinkedIn Carousel Generator

Implementing Auto Tiling with Just 5 Tiles

Open Challange (Get all Universities involved

Apple Tried to Tamper Proof AirTag 2 Speakers – I Broke It [video]

Show HN: Isolating AI-generated code from human code | Vibe as a Code

Show HN: More beautiful and usable Hacker News

Toledo Derailment Rescue [video]

War Department Cuts Ties with Harvard University

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

A Bid-Based NFT Advertising Grid

AI readability score for your documentation

NASA Study: Non-Biologic Processes Don't Explain Mars Organics

I inhaled traffic fumes to find out where air pollution goes in my body

X said it would give $1M to a user who had previously shared racist posts

155M US land parcel boundaries

Private Inference

Font Rendering from First Principles

Show HN: Seedance 2.0 AI video generator for creators and ecommerce

Wally: A fun, reliable voice assistant in the shape of a penguin

Rewriting Pycparser with the Help of an LLM

Lobsters Vibecoding Challenge

Booting 5000 Erlangs on Ampere One 192-core

Comments