Scheduling Policies#

On time-sliced vGPUs, the scheduler decides how GPU time is shared among VMs. The policy sets how long a VM may run before preemption (the time slice), which trades raw throughput against scheduling latency.

Longer slices mean less context switching and suit compute-heavy CUDA work. Shorter slices give other VMs turns sooner and suit latency-sensitive guests (including graphics).

NVIDIA vGPU for Compute exposes three scheduling policies that differ in how GPU time is allocated across VMs:

Table 37 vGPU Scheduling Policy Comparison#

Mode

Allocation model

When to use

Best Effort (default)

Time slices distributed across VMs without reservation; idle VMs cede their slots to active ones.

General-purpose mixed workloads where peak utilization matters more than predictability.

Equal Share

Each running VM receives an equal fraction of GPU time regardless of its load.

Multi-tenant fairness; consistent per-VM throughput when several VMs run continuously.

Fixed Share

Each VM receives a fixed fraction of GPU time, even when other VMs are idle.

Reservation-style guarantees; SLA workloads requiring predictable minimums.

For full mode definitions, refer to the vGPU Schedulers documentation. For configuration commands and per-mode tuning, refer to Changing Scheduling Behavior for Time-Sliced vGPUs.

Scheduling Limitations#

  • Scheduling policies apply only to time-sliced vGPUs. MIG-backed vGPUs have dedicated hardware resources and do not use time-slicing.

  • Fixed Share scheduling requires the sched plugin parameter to be set at the vGPU Manager level; it cannot be changed per-VM at runtime.

  • Scheduling policy changes require a vGPU Manager restart to take effect on existing vGPUs.

  • The default scheduling policy (Best Effort) does not guarantee minimum GPU time for any VM. Use Equal Share or Fixed Share when predictable GPU allocation is required.