An analysis of recent multithreading improvements for a smoother game

https://dev.arma3.com/post/oprep-performance-optimizations-in-220

41•diggan•3d ago

Comments

hinkley•1h ago

> The design of the job system decides how we are able to utilize it, and the old design (in use since Arma 2) worked, but it was quite primitive. It only allowed submitting jobs as one block, and there could only be one block active at a time. This means all jobs had to be submitted at once, after which we needed to wait for the jobs to be finished, before we could continue doing other things.

> This is also called "Fork-Join".

I removed one of these from a batch processing system and got about 3x the throughput in the process.

As it was written, the system was trying not to be a noisy neighbor for online (consumer-facing) traffic and was not succeeding. It could still cause brownouts if too many other things were running at the same time. So the authors thought to have the users run Splunk queries, looking for traffic from these other jobs, to see if it was safe to run the CI/CD pipeline. That's ridiculous. It's the sort of thing someone Ops-facing forgets that nobody else on the team gives a shit about most days.

If we wanted people to be able to run it without consequence, it needed to go quite a bit slower, but it was already more than half of the wall clock time being spent in a runbook as it was. So I replaced it with a queue instead, that would have n tasks running all the time, instead of up to n and as few as 1. I ended up being able to tune it to about 75-80% of the original number with little problem. Well, one problem. There was one group of customers whose assets generated about 10x the load of any other customer, and the way we queued the work clustered by group. Once I sorted by customer instead of group we stopped getting a clustering of excess workload, and it also made it a hell of a lot easier to eyeball the status messages and estimate how far along you were.

Spiky processes cause problems for both load-shedding and autoscaling schemes. And starting too many tasks at once causes memory pressure on your end, since most data processing tasks take more memory in the middle than at the beginning or the end. You are better off self-throttling your traffic so you make the memory load more sequential and you allow compensatory systems the time they need to adapt appropriately to your sudden peak of traffic.

o11c•1h ago

Unrelated to the specific game:

Note that the "can't parallelize AI/scripting" is a consequence for a design choice that many people make without thinking - namely, that all actors should, internally, have perfectly accurate and up-to-date knowledge of the world, using the same in-game global objects.

If each actor makes a copy of the world for what they know, there's nothing preventing parallelism. This does imply quadratic memory, but you can just cap this - if there's a lot going on, it makes sense for an actor to lose track of some of it. And once you're working with imperfect knowledge, you can just ... throttle the AI's think time if it's doing too much.

Another thing you can do, assuming you already have suspendable scripts, is "merge" the thread affinity of two scriptable objects when you know they need to interact (only really needed for transaction-like things; often you can just emit some async state to be resolved later). Actually, you don't need to suspend if you have enough static analysis, but suspending is probably actually the easier thing to do.

Related to this, IMO it's a mistake to expose functions to the scripting API that access state implicitly. It's a better design to expose objects (that can only be accessed/mutated if passed as an argument) - and to expose a different set of objects depending on what context the script is called in. A type-checker is really useful here.

dustbunny•32m ago

Another model may be that all actors read from frame N-1 and write to frame N. Then there's not many copies, just one.

AbsenceBench: Language models can't tell what's missing

Phoenix.new – Remote AI Runtime for Phoenix

Wiki Radio: The thrilling sound of random Wikipedia

Show HN: Inspect and extract files from MSI installers directly in your browser

Visualizing environmental costs of war in Hayao Miyazaki's Nausicaä

Show HN: Nxtscape – an open-source agentic browser

Harper – an open-source alternative to Grammarly

Verified dynamic programming with Σ-types in Lean

Dancing Naked on the Head of a Pin: The Early History of Microphotography

Cracovians: The Twisted Twins of Matrices

The JAWS shark is public domain

Tuxracer.js play Tux Racer in the browser

Libraries are under-used. LLMs make this problem worse

Jürgen Schmidhuber：the Father of Generative AI Without Turing Award

Alpha Centauri

Oklo, the Earth's Two-billion-year-old only Known Natural Nuclear Reactor (2018)

Smartphones: Parts of Our Minds? Or Parasites?

A Python-first data lakehouse

BYD begins testing solid-state EV batteries in the Seal

Ancient termite poo reveals 120M-year-old secrets of Australia's forests

AMD's Freshly-Baked MI350: An Interview with the Chief Architect

An analysis of recent multithreading improvements for a smoother game

Show HN: SnapQL – Desktop app to query Postgres with AI

Klong: A Simple Array Language

Minimal auto-differentiation engine in Rust

Finding Peter Putnam

Career advice, or something like it

A Brief, Incomplete, and Mostly Wrong History of Robotics

How to Design Programs 2nd Ed (2024)

Show HN: I Built a Site That Curates Weird YouTube Rabbit Holes Daily

An analysis of recent multithreading improvements for a smoother game

Comments

AbsenceBench: Language models can't tell what's missing

Phoenix.new – Remote AI Runtime for Phoenix

Wiki Radio: The thrilling sound of random Wikipedia

Show HN: Inspect and extract files from MSI installers directly in your browser

Visualizing environmental costs of war in Hayao Miyazaki's Nausicaä

Show HN: Nxtscape – an open-source agentic browser

Harper – an open-source alternative to Grammarly

Verified dynamic programming with Σ-types in Lean

Dancing Naked on the Head of a Pin: The Early History of Microphotography

Cracovians: The Twisted Twins of Matrices

The JAWS shark is public domain

Tuxracer.js play Tux Racer in the browser

Libraries are under-used. LLMs make this problem worse

Jürgen Schmidhuber：the Father of Generative AI Without Turing Award

Alpha Centauri

Oklo, the Earth's Two-billion-year-old only Known Natural Nuclear Reactor (2018)

Smartphones: Parts of Our Minds? Or Parasites?

A Python-first data lakehouse

BYD begins testing solid-state EV batteries in the Seal

Ancient termite poo reveals 120M-year-old secrets of Australia's forests

AMD's Freshly-Baked MI350: An Interview with the Chief Architect

An analysis of recent multithreading improvements for a smoother game

Show HN: SnapQL – Desktop app to query Postgres with AI

Klong: A Simple Array Language

Minimal auto-differentiation engine in Rust

Finding Peter Putnam

Career advice, or something like it

A Brief, Incomplete, and Mostly Wrong History of Robotics

How to Design Programs 2nd Ed (2024)

Show HN: I Built a Site That Curates Weird YouTube Rabbit Holes Daily