This project explores GPU sharing in Kubernetes without relying on MIG or vGPU.
The motivation came from inference-heavy workloads where GPUs are often underutilized, but MIG/vGPU can be unavailable, too rigid, or operationally heavy. In many cases, time-sharing is “good enough”.
What the project does:
Introduces a SharedDeviceGroup CRD
Multiple pods can reference the same group via annotation
A scheduler plugin selects GPUs and injects NVIDIA_VISIBLE_DEVICES
Optional device plugin for kubelet awareness / coexistence
Supports spread and binpack placement strategies
The design intentionally stays at the scheduler + control plane level, avoiding deep GPU virtualization while still improving utilization.
This has been working well for inference workloads where strict isolation isn’t required. I’m very interested in feedback from folks who’ve tried MIG, GPU time-slicing, NVIDIA KAI, or custom schedulers in production.
ChuangLabs•1h ago
This project explores GPU sharing in Kubernetes without relying on MIG or vGPU.
The motivation came from inference-heavy workloads where GPUs are often underutilized, but MIG/vGPU can be unavailable, too rigid, or operationally heavy. In many cases, time-sharing is “good enough”.
What the project does:
Introduces a SharedDeviceGroup CRD
Multiple pods can reference the same group via annotation
A scheduler plugin selects GPUs and injects NVIDIA_VISIBLE_DEVICES
Optional device plugin for kubelet awareness / coexistence
Supports spread and binpack placement strategies
The design intentionally stays at the scheduler + control plane level, avoiding deep GPU virtualization while still improving utilization.
This has been working well for inference workloads where strict isolation isn’t required. I’m very interested in feedback from folks who’ve tried MIG, GPU time-slicing, NVIDIA KAI, or custom schedulers in production.