So I wrote the whole training pipeline from scratch as Metal shaders: projection, tile-based rasterization, SSIM loss, backward pass, Adam, and densification. Everything runs on the GPU
msplat trains 7k iterations of full-resolution Mip-NeRF 360 scenes in ~90s on my M4 Max. In the README I compare against gsplat's published numbers, which were measured on a TITAN RTX. Ofc these are different hardware classes, so take the wall-time comparisons with a grain of salt
Python bindings are on PyPI (pip install msplat), and there are Swift bindings if you want to embed this in a native app. Happy to answer questions about any of the internals
Repo: https://github.com/rayanht/msplat (Apache 2.0)