Application or vGPU VM crashes when multiple application instances are launched
Description
When multiple application instances are launched on a legacy vGPU that is allocated only a fraction of the physical GPU's frame buffer, the application or VM to which the vGPU is assigned crashes but the guest VM remains accessible.. A legacy NVIDIA vGPU does not support single root I/O virtualization (SR-IOV). This issue does not affect NVIDIA vGPUs that support SR-IOV.
When this issue occurs, the following error message is written to the vmware.log file:
vmiop_log: (0x0): VGPU message 7 failed
This issue occurs when the plugin for legacy NVIDIA vGPUs creates more BAR1 mappings than the hypervisor allows a VM to create. These mappings depend on the number and type of applications running in the VM.
Workaround
A workaround is available for the following GPUs, all of which have a large physical BAR1 memory size:
- Quadro RTX 6000 Passive
- Quadro RTX 8000 Passive
- Tesla V100 (all variants)
This workaround is not available for other GPUs that are affected by this issue.
To employ this workaround, set the vGPU plugin parameter pciPassthru0.cfg.plugin_managed_bar1_va_override
to 1.
Status
Open
Ref. #
200680865