So...given that dumb guy (or, more charitably to myself, humanities guy who happens to work in tech) understanding of these phenomena, my ears perk up when they say they've trained a model on random numbers, but still get it to do something semi-useful. Is this as big a deal as it seems? Have we now worked out a way to make the gigawatts' worth of video cards "smart" without human language?
Can anyone help shed light on why the MPS backend for PyTorch produces different numbers compared to the CUDA & CPU devices? I don't mean unsupported ops & CPU fallback, I mean fast, garbage numbers coming out of MPS. This PR references numerous other PyTorch issues related to MPS inaccuracy: https://github.com/Stability-AI/stable-audio-tools/pull/225
The story:
This tutorial post was a lesson I wrote for my undergrad "Deep Learning & AI Ethics" class (https://github.com/drscotthawley/DLAIE).
The plan for the semester was to abandon the standard lesson+assignment format (since LLMs make coding assignments moot) in favor of a project-based learning approach: We would, as a class, build a text-conditioned latent flow matching generative model from scratch, because in so doing we'd cover essentially all the key topics of a "normal" course.
We pivoted to adding guidance to pretrained models for logistical reasons, specifically working with Stable Audio Open Small, but we hit a snag re. our MPS outputs and I wonder if any readers here can help.
(Students are overwhelmingly Mac users, my small college doesn't provide GPUs, CPU execution is too slow, Colab takes too long to setup and then kicks us off. Waiting on an NSF NAIRR Pilot education allocation for some remote GPU access.)
rundigen12•2mo ago