(2021), still very interesting. Especially the "post-overfitting" training strategy is unexpected.
luckystarr•7mo ago
I remember vaguely that this was observed when training GPT-3 (probably?) as well. Just trained on and on, and the error went up and then down again. Like a phase transition in the model.
xg15•7mo ago
luckystarr•7mo ago
dev_hugepages•7mo ago