I found the disagreement striking. Kaiser argues Transformers still win unless someone shows a better scaling curve while the other researchers argue the field is overfitting to current hardware and missing better architectures.
There was a back-and-forth on scaling, hardware constraints, continual learning and latent reasoning.
Cappybara12•27m ago
There was a back-and-forth on scaling, hardware constraints, continual learning and latent reasoning.