GPU utilization in Kubernetes is still surprisingly poor for many inference and interactive workloads.
Most clusters either:
allocate exclusive GPUs per pod, or
rely on MIG / vGPU, which introduces rigidity and operational complexity.
I’m experimenting with a different approach: scheduler-level GPU sharing.
Shared Device Group is a Kubernetes extension that lets multiple pods share one or more GPUs, with GPU selection handled by the scheduler instead of hardware partitioning.
High-level idea:
A SharedDeviceGroup CRD defines a logical GPU group
Pods reference the group via annotation
The scheduler plugin selects a node + GPU set
Selected GPUs are injected via NVIDIA_VISIBLE_DEVICES
Optional device-plugin integration for kubelet accounting
This works best for:
inference workloads
bursty / short-lived GPU tasks
scenarios where strict GPU isolation isn’t required
Trade-offs:
not a replacement for MIG/vGPU
requires scheduler involvement
best suited for dedicated GPU-sharing nodes
Repo: https://github.com/sceneryback/shared-device-group
I’d appreciate feedback, especially from folks running large GPU clusters or inference platforms.