What you can do with this:
* Save (and retrieve) model checkpoints (optionally with a content-addressable naming scheme) on blob storage
* Load datasets incrementally from blob storage into Pytorch, using a local disk cache
* store your training metrics into SQLite
Design principles :
* "dumb cloud and smart software" - I prefer commodity services like object storage and container runtimes to framework-like abstractions (e.g. managed MLFlow or similar)
* extend Lightning in the most straightforward way
* let the user assemble a lightweight MLOps process with minimal changes to preexisting model code.
Happy to field any questions and receive feedback !
The library was refined using Sonnet, but thoroughly checked by eye and hand.