AI/ML is a vertically integrated field. Frameworks and methods that integrate deeply down the stack win. Despite this, most open-source ML libraries and frameworks do not integrate with infrastructure whatsoever, leaving abundant scaling, fault-tolerance, resource-optimization, and reproducibility opportunities on the table. With so many infrastructure layers to support, like Kubernetes, Slurm, SageMaker, and more, this was a practical necessity for resource-constrained OSS maintainers.
Over the last year, as the ML world converged on Kubernetes, we saw a massive opportunity: To provide a native interface for OSS ML libraries and frameworks to integrate deeply with infrastructure, providing programmatic, "serverless" interfaces into Kubernetes which meets them there they are. Their code remains portable across clouds, OSS, and industry, and zero-cost abstractions give them all of Kubernetes' richness and control if they want it.
Today, we're open-sourcing Kubetorch to fill this gap. Others have tried to bring ML to Kubernetes, we're bringing Kubernetes to the ML devs. One fun use case is for OSS ML libraries to easily use custom compute in CI (e.g. GPUs, distributed) cost-effectively and portably.
If you have feedback, papercuts, or interest in collaborating, we'd love to hear!
donnygreenberg•5h ago
AI/ML is a vertically integrated field. Frameworks and methods that integrate deeply down the stack win. Despite this, most open-source ML libraries and frameworks do not integrate with infrastructure whatsoever, leaving abundant scaling, fault-tolerance, resource-optimization, and reproducibility opportunities on the table. With so many infrastructure layers to support, like Kubernetes, Slurm, SageMaker, and more, this was a practical necessity for resource-constrained OSS maintainers.
Over the last year, as the ML world converged on Kubernetes, we saw a massive opportunity: To provide a native interface for OSS ML libraries and frameworks to integrate deeply with infrastructure, providing programmatic, "serverless" interfaces into Kubernetes which meets them there they are. Their code remains portable across clouds, OSS, and industry, and zero-cost abstractions give them all of Kubernetes' richness and control if they want it.
Today, we're open-sourcing Kubetorch to fill this gap. Others have tried to bring ML to Kubernetes, we're bringing Kubernetes to the ML devs. One fun use case is for OSS ML libraries to easily use custom compute in CI (e.g. GPUs, distributed) cost-effectively and portably.
If you have feedback, papercuts, or interest in collaborating, we'd love to hear!