You are the Compute Officer aboard a generation ship. Systems are failing, a signal arrives from deep space, and every mission is a real distributed ML problem — fix OOM errors, configure tensor parallelism, scale training across clusters, optimise inference throughput.
The game runs on a first-principles physics engine: FLOPs, memory bandwidth, collective communication, pipeline bubbles. Calibrated against published runs from Meta, DeepSeek, and NVIDIA within 1-2% MFU.
There's also a Learn mode with 60 tasks (from beginner to advanced) covering both training and inference, and a full simulator for exploration and planning, if you are not into the story. All client-side, no backend.
zhebrak•1h ago
You are the Compute Officer aboard a generation ship. Systems are failing, a signal arrives from deep space, and every mission is a real distributed ML problem — fix OOM errors, configure tensor parallelism, scale training across clusters, optimise inference throughput.
The game runs on a first-principles physics engine: FLOPs, memory bandwidth, collective communication, pipeline bubbles. Calibrated against published runs from Meta, DeepSeek, and NVIDIA within 1-2% MFU.
There's also a Learn mode with 60 tasks (from beginner to advanced) covering both training and inference, and a full simulator for exploration and planning, if you are not into the story. All client-side, no backend.
GitHub: https://github.com/zhebrak/llm-cluster-simulator