It runs matrix ops, FFTs, and particle sims concurrently and draws live utilization bars in the terminal — one for CPU, one for GPU.
Curious if the workload mix is actually representative or if I'm missing something obvious. Also open to feedback on the Metal implementation if anyone's gone down that path.