Are there any public resources or datasets tracking training costs for open-weight models (I'm guessing this data is hard to get for closed models, but happy to be proved wrong.)
I'm especially interested in understanding which architectural changes (e.g., attention variants, parameter sharing, mixture-of-experts) have led to major cost optimizations, and NOT just from the companies behind these models, but from anyone who has trained or replicated them.