Let's assume this is a paradigm shift on the scale of Transformers / `Attention is all you need`. Companies build out new models and pump another $100 Billion through it. And then a year from now, another innovation comes out. Same circus. And again.
No one wants to be left behind but trying to keep up will sink smaller companies.
Yes, the more recent generation of GPUs optimize for attention math. But they are still fairly "general-purpose" accelerators as well. So when I see papers like this (interesting idea, btw!), my mental model for costs suggests that the CapEx to buy up the GPUs and build out the data centers would get re-used for this and 100s of other ideas and experiments.
And then the hope is that the best ideas will occupy more of the available capacity...
hzia•13h ago
The downside is that this is going to be extremely expensive, so the data set to conduct RL will need to be curated.
watsonmusic•7h ago
nsagent•5h ago
[1]: https://news.ycombinator.com/item?id=41776324