> in the time that Python can perform a single FLOP, an A100 could have chewed through 9.75 million FLOPS
wild
xyzsparetimexyz•9m ago
Single core vs multi core accounts for much of this
noosphr•16m ago
>For example, getting good performance on a dataset with deep learning also involves a lot of guesswork. But, if your training loss is way lower than your test loss, you're in the "overfitting" regime, and you're wasting your time if you try to increase the capacity of your model.
Generally, posting a link-only reply without further elaboration comes across as a bit rude. Are you providing support for the above point? Refuting it? You felt compelled to comment, a few words to indicate what you’re actually trying to say would go a long way.
noosphr•2m ago
>We show that a variety of modern deep learning tasks exhibit a "double-descent" phenomenon where, as we increase model size, performance first gets worse and then gets better.
tosh•18m ago
wild
xyzsparetimexyz•9m ago