Scheduling Policies#
On time-sliced vGPUs, the scheduler decides how GPU time is shared among VMs. The policy sets how long a VM may run before preemption (the time slice), which trades raw throughput against scheduling latency.
Longer slices mean less context switching and suit compute-heavy CUDA work. Shorter slices give other VMs turns sooner and suit latency-sensitive guests (including graphics).
NVIDIA vGPU for Compute exposes three scheduling policies that differ in how GPU time is allocated across VMs:
Mode |
Allocation model |
When to use |
|---|---|---|
Best Effort (default) |
Time slices distributed across VMs without reservation; idle VMs cede their slots to active ones. |
General-purpose mixed workloads where peak utilization matters more than predictability. |
Equal Share |
Each running VM receives an equal fraction of GPU time regardless of its load. |
Multi-tenant fairness; consistent per-VM throughput when several VMs run continuously. |
Fixed Share |
Each VM receives a fixed fraction of GPU time, even when other VMs are idle. |
Reservation-style guarantees; SLA workloads requiring predictable minimums. |
For full mode definitions, refer to the vGPU Schedulers documentation. For configuration commands and per-mode tuning, refer to Changing Scheduling Behavior for Time-Sliced vGPUs.
Scheduling Limitations#
Scheduling policies apply only to time-sliced vGPUs. MIG-backed vGPUs have dedicated hardware resources and do not use time-slicing.
Fixed Share scheduling requires the
schedplugin parameter to be set at the vGPU Manager level; it cannot be changed per-VM at runtime.Scheduling policy changes require a vGPU Manager restart to take effect on existing vGPUs.
The default scheduling policy (Best Effort) does not guarantee minimum GPU time for any VM. Use Equal Share or Fixed Share when predictable GPU allocation is required.