Same architecture, same dataset, same loss, same seed. The only variable changed is the geometry of the target space. That's enough to completely change the convergence behavior: FM keeps reinforcing a slightly wrong trajectory until late in training, while AToM never commits to a wrong trajectory in the first place. The point isn't a huge final FID gap. It’s that the failure mode disappears.
drbt•36m ago