fp.
newest
Open in hackernews
Optimizing Time, Cost, and Generalization in Distributed Large-Batch Training
https://arxiv.org/abs/2603.18112
2
•
PaulHoule
•
1h ago