If you train a model for too long, it may overfit it's training data. Not surprising, this has been know for like forever. But did you know you can detect the signatures of overfitting in the layer weight matrices directly, without needing access to any data (train or test) ?
In our recent paper (with hari kishan prakash ), - : - , we show this explicitly in 2 different classic grokking experiments. And the overfitting we see is very different from what has been seen before!
charleshmartin•2h ago
In our recent paper (with hari kishan prakash ), - : - , we show this explicitly in 2 different classic grokking experiments. And the overfitting we see is very different from what has been seen before!
paper: https://arxiv.org/abs/2602.02859