The BERT models are easy to probability calibrate too!
BERT + pooling + SVM works pretty good for some problems and is maybe 20x faster to train than the fine-tuned BERT.
My take as an academic-adjacent [1] developer of boring and reliable applications is that I don't like the training recipes people use for fine-tuned BERT [2] and think that BERT + biLSTM + probability calibration should equal or exceed those fine-tuned BERTs particularly because I think I can add early stopping and do model selection with a parameter scan.
[1] reads arXiv papers where run-of-the-mill researchers solve run-of-the-mill problems
[2] particularly as the number of samples is >> 500 which is easy to get in many cases; e.g. for most tasks you can make 1-2k judgements a day though with visual tasks when I've done 5k a day sprints for a few days I start to hallucinate and compulsive classify scenes in front of me
PaulHoule•1d ago
BERT + pooling + SVM works pretty good for some problems and is maybe 20x faster to train than the fine-tuned BERT.
My take as an academic-adjacent [1] developer of boring and reliable applications is that I don't like the training recipes people use for fine-tuned BERT [2] and think that BERT + biLSTM + probability calibration should equal or exceed those fine-tuned BERTs particularly because I think I can add early stopping and do model selection with a parameter scan.
[1] reads arXiv papers where run-of-the-mill researchers solve run-of-the-mill problems
[2] particularly as the number of samples is >> 500 which is easy to get in many cases; e.g. for most tasks you can make 1-2k judgements a day though with visual tasks when I've done 5k a day sprints for a few days I start to hallucinate and compulsive classify scenes in front of me