Never used gpu spot instances before but I would have to imagine getting interrupted is pretty annoying.
hhh•29m ago
it depends, our workloads can finish up in under two minutes and shut down without much effort, so we haven’t really noticed it outside of one time when we had no spot capacity.
janalsncm•13m ago
I guess if checkpointing is set up correctly and your runtime is saved to a docker image it’s feasible. Probably not going to get a 3 hour continuous chunk of time I would assume.
joeig•13m ago
In addition to the two-minute interruption notice, rebalance recommendations[0] allow you to handle interruptions even more gracefully.
janalsncm•32m ago
hhh•29m ago
janalsncm•13m ago
joeig•13m ago
[0] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/rebalanc...