Scheduling Policies#
On time-sliced vGPUs, the scheduler decides how GPU time is shared among VMs. The policy sets how long a VM may run before preemption (the time slice), which trades raw throughput against scheduling latency.
Longer slices mean less context switching and suit compute-heavy CUDA work. Shorter slices give other VMs turns sooner and suit latency-sensitive guests (including graphics).
NVIDIA vGPU for Compute exposes three modes—Best Effort, Equal Share, and Fixed Share—for different fairness and reservation goals. For details, refer to the vGPU Schedulers documentation.
Refer to the Changing Scheduling Behavior for Time-Sliced vGPUs documentation for how to configure and adjust scheduling policies to meet specific resource distribution needs.
Limitations#
Scheduling policies apply only to time-sliced vGPUs. MIG-backed vGPUs have dedicated hardware resources and do not use time-slicing.
Fixed Share scheduling requires the
schedplugin parameter to be set at the vGPU Manager level; it cannot be changed per-VM at runtime.Scheduling policy changes require a vGPU Manager restart to take effect on existing vGPUs.
The default scheduling policy (Best Effort) does not guarantee minimum GPU time for any VM. Use Equal Share or Fixed Share when predictable GPU allocation is required.