An analysis of recent multithreading improvements for a smoother game

https://dev.arma3.com/post/oprep-performance-optimizations-in-220

67•diggan•7mo ago

Comments

hinkley•7mo ago

> The design of the job system decides how we are able to utilize it, and the old design (in use since Arma 2) worked, but it was quite primitive. It only allowed submitting jobs as one block, and there could only be one block active at a time. This means all jobs had to be submitted at once, after which we needed to wait for the jobs to be finished, before we could continue doing other things.

> This is also called "Fork-Join".

I removed one of these from a batch processing system and got about 3x the throughput in the process.

As it was written, the system was trying not to be a noisy neighbor for online (consumer-facing) traffic and was not succeeding. It could still cause brownouts if too many other things were running at the same time. So the authors thought to have the users run Splunk queries, looking for traffic from these other jobs, to see if it was safe to run the CI/CD pipeline. That's ridiculous. It's the sort of thing someone Ops-facing forgets that nobody else on the team gives a shit about most days.

If we wanted people to be able to run it without consequence, it needed to go quite a bit slower, but it was already more than half of the wall clock time being spent in a runbook as it was. So I replaced it with a queue instead, that would have n tasks running all the time, instead of up to n and as few as 1. I ended up being able to tune it to about 75-80% of the original number with little problem. Well, one problem. There was one group of customers whose assets generated about 10x the load of any other customer, and the way we queued the work clustered by group. Once I sorted by customer instead of group we stopped getting a clustering of excess workload, and it also made it a hell of a lot easier to eyeball the status messages and estimate how far along you were.

Spiky processes cause problems for both load-shedding and autoscaling schemes. And starting too many tasks at once causes memory pressure on your end, since most data processing tasks take more memory in the middle than at the beginning or the end. You are better off self-throttling your traffic so you make the memory load more sequential and you allow compensatory systems the time they need to adapt appropriately to your sudden peak of traffic.

o11c•7mo ago

Unrelated to the specific game:

Note that the "can't parallelize AI/scripting" is a consequence for a design choice that many people make without thinking - namely, that all actors should, internally, have perfectly accurate and up-to-date knowledge of the world, using the same in-game global objects.

If each actor makes a copy of the world for what they know, there's nothing preventing parallelism. This does imply quadratic memory, but you can just cap this - if there's a lot going on, it makes sense for an actor to lose track of some of it. And once you're working with imperfect knowledge, you can just ... throttle the AI's think time if it's doing too much.

Another thing you can do, assuming you already have suspendable scripts, is "merge" the thread affinity of two scriptable objects when you know they need to interact (only really needed for transaction-like things; often you can just emit some async state to be resolved later). Actually, you don't need to suspend if you have enough static analysis, but suspending is probably actually the easier thing to do.

Related to this, IMO it's a mistake to expose functions to the scripting API that access state implicitly. It's a better design to expose objects (that can only be accessed/mutated if passed as an argument) - and to expose a different set of objects depending on what context the script is called in. A type-checker is really useful here.

dustbunny•7mo ago

Another model may be that all actors read from frame N-1 and write to frame N. Then there's not many copies, just one.

akrotkov•7mo ago

Two, surely? The previous one still being used and the new one being written.

(Note that this is how most rendering artifacts were fixed long ago - the on screen and the off screen buffers were swapped, so nobody would "see" in progress scenes)

Retric•7mo ago

1 copy + the original = 2 instances in memory.

o11c•7mo ago

That's fine if actors are not doing much (e.g. just pathfinding and shooting), but is likely to fall apart long before scripting becomes a thing, since there are a lot of tasks where different actors do want to mutate the same object.

Retric•7mo ago

This is AI, having multiple actors plan to gather the same resource etc isn’t necessarily a flaw. It does however result in meaningful game design decisions getting tided up in how the game engine is designed.

dustbunny•7mo ago

This is a great example. It's probably pretty insignificant from a players perspective in most contexts. And you could most certainly design some acceleration structures to specifically handle AI convergence specific to your use case.

dustbunny•7mo ago

Can you share an example where this model falls apart?

Aurornis•7mo ago

> If each actor makes a copy of the world for what they know, there's nothing preventing parallelism

Unfortunately not that easy.

If two scripts consume an item, decrement a variable (hit points, credits, count) then it’s not as simple as giving each script a copy of the world. You get double spending, duplicated items, etc.

o11c•7mo ago

The point of the copy is that they can't actually change anything outside of their local world. They can only send out an async request for global state to change, or send a request to be scheduled on the same executor.

marijnz•7mo ago

And this specifically introduces transactional boundaries, which can introduce a lot of complexity. Especially in games where the domains (thus boundaries) may change a lot during development as the game design is being figured out.

db48x•7mo ago

This! One of the fun things about writing programs in Rust is that it really makes you aware of ownership of data inside the program. If you try to write a game where the game loop owns the game state, then loops over the actors letting each actor also modify the game state, then the compiler will complain. The exact error messages may vary, since there are a number of variations that the compiler might notice first. For example, an actor that spawns another actor will trigger an error when it tries to mutate the list of actors, and so on.

After going around and around for a while trying to satisfy the compiler, you’ll eventually conclude that either the game loop must hold no mutable state at all or that the actors must not get mutable access to the game state. The easiest solution is to give actors a read–only view of the world and have them produce a list of changes that they would like to make. Then the whole thing becomes trivially parallelizable. It doesn’t matter how many threads you scatter the actors out over because none of them are modifying the world. You just gather up the list of changes and merge them. Even merging them can be incremental and spread over multiple threads if your merge is associative.

Of course for some game designs you may still need or want the actors to be more serialized than that. For example in Factorio there are dependencies between inserters and belts and power networks and circuit conditions and so on. The inserters cannot be updated before the power networks or circuit conditions have been computed, and so on. But that just means that you have multiple steps where you scatter the computation for some set of actors across all available cpus, gather the results into a single list of changes to the world, then update the world based on those changes.

egypturnash•7mo ago

"From the feedback on Profiling branch we have seen several times that players had issues caused by setting wrong command-line parameters. That was typically because an old optimization guide told them it would make their game run better. But now that the game is more multithreaded than it used to be, bad settings have a more noticeable and potentially negative impact."

I wonder if they also added a popup that detects some of these pessimal command-line parameters being used and says "hey we did a lot of optimizing work that's made these settings iffy, please go check out this blog post for more details".

absurdo•7mo ago

Ah good ol’ Operation Flashpoint: Cold War Crisis. What an absolute time to be alive.

abrookewood•7mo ago

FYI this 12 year old game is still incredible if you are in to mil sims. Really worth checking out and amazing that they are still releasing updates for it.

Will Future Generations Think We're Gross?

Kernel Key Retention Service

State Department will delete Xitter posts from before Trump returned to office

Show HN: Verifiable server roundtrip demo for a decision interruption system

Impl Rust – Avro IDL Tool in Rust via Antlr

Stories from 25 Years of Software Development

minikeyvalue

Neomacs: GPU-accelerated Emacs with inline video, WebKit, and terminal via wgpu

Show HN: Moli P2P – An ephemeral, serverless image gallery (Rust and WebRTC)

How I grow my X presence?

What's the cost of the most expensive Super Bowl ad slot?

What if you just did a startup instead?

Hacking up your own shell completion (2020)

Show HN: Gorse 0.5 – Open-source recommender system with visual workflow editor

GLM-OCR: Accurate × Fast × Comprehensive

Local Agent Bench: Test 11 small LLMs on tool-calling judgment, on CPU, no GPU

Show HN: AboutMyProject – A public log for developer proof-of-work

Expertise, AI and Work of Future [video]

So Long to Cheap Books You Could Fit in Your Pocket

PID Controller

SpaceX Rocket Generates 100GW of Power, or 20% of US Electricity

Kubernetes MCP Server

I Built a Movie Recommendation Agent to Solve Movie Nights with My Wife

What were the first animals? The fierce sponge–jelly battle that just won't end

Sidestepping Evaluation Awareness and Anticipating Misalignment

OldMapsOnline

What It's Like to Be a Worm

Don't go to physics grad school and other cautionary tales

Lawyer sets new standard for abuse of AI; judge tosses case

AI anxiety batters software execs, costing them combined $62B: report

Will Future Generations Think We're Gross?

Kernel Key Retention Service

State Department will delete Xitter posts from before Trump returned to office

Show HN: Verifiable server roundtrip demo for a decision interruption system

Impl Rust – Avro IDL Tool in Rust via Antlr

Stories from 25 Years of Software Development

minikeyvalue

Neomacs: GPU-accelerated Emacs with inline video, WebKit, and terminal via wgpu

Show HN: Moli P2P – An ephemeral, serverless image gallery (Rust and WebRTC)

How I grow my X presence?

What's the cost of the most expensive Super Bowl ad slot?

What if you just did a startup instead?

Hacking up your own shell completion (2020)

Show HN: Gorse 0.5 – Open-source recommender system with visual workflow editor

GLM-OCR: Accurate × Fast × Comprehensive

Local Agent Bench: Test 11 small LLMs on tool-calling judgment, on CPU, no GPU

Show HN: AboutMyProject – A public log for developer proof-of-work

Expertise, AI and Work of Future [video]

So Long to Cheap Books You Could Fit in Your Pocket

PID Controller

SpaceX Rocket Generates 100GW of Power, or 20% of US Electricity

Kubernetes MCP Server

I Built a Movie Recommendation Agent to Solve Movie Nights with My Wife

What were the first animals? The fierce sponge–jelly battle that just won't end

Sidestepping Evaluation Awareness and Anticipating Misalignment

OldMapsOnline

What It's Like to Be a Worm

Don't go to physics grad school and other cautionary tales

Lawyer sets new standard for abuse of AI; judge tosses case

AI anxiety batters software execs, costing them combined $62B: report

An analysis of recent multithreading improvements for a smoother game

Comments