What can I help you with?
vGPU Troubleshooting Guide

Installation and Upgrade Issues

If the GPU Operator is installed using Helm, always use the helm uninstall command to ensure proper cleanup of all GPU Operator artifacts. Manually deleting components using kubectl can leave stale entries, leading to installation failures with errors about invalid ownership metadata or namespace mismatches.

Next Steps

  1. Automated Cleanup:

    1. Run: helm uninstall --wait gpu-operator nvidia/gpu-operator -n gpu-operator

    2. Replace the namespace if necessary.

  2. Manual Cleanup:

    1. Delete Custom Resource Definitions and deployments.

    2. Remove namespace, ClusterRoles, and ClusterRoleBindings.

For more detailed instructions and additional information, visit the full article here.

The GPU Operator installation fails on Kubernetes versions 1.25 and later due to the removal of the PodSecurityPolicy (PSP) API. Users attempting to install GPU Operator versions below 22.09 with PSP enabled encounter the following error: Error: INSTALLATION FAILED: unable to build Kubernetes objects from release manifest: resource mapping not found for name: "gpu-operator-restricted" namespace: "" from "": no matches for kind "PodSecurityPolicy" in version "policy/v1beta1"

Next Steps

  1. Disable PSP during installation using the following command: helm install --wait gpu-operator nvaie/gpu-operator-1-3 -n gpu-operator --set driver.image=vgpu-guest-driver-1-3 --set psp.enabled=false

  2. Upgrade to the latest GPU Operator version that supports Kubernetes 1.25+.

For more detailed instructions and additional information, visit the full article here.

The Linux GRID driver fails to install on Ubuntu 22.04.2 LTS with kernel version 6.5.0-41 due to:

  1. There is a mismatch between the GCC version used to build the kernel and the installed GCC version.

  2. An unsupported GCC option -ftrivial-auto-var-init=zero in GCC-11.

Next Steps

  1. Fix GCC version mismatch:

    1. Install GCC-12

    2. Run the NVIDIA installer

    3. Alternatively, update symbolic links for GCC to point to GCC-12

  2. Resolve GCC option issues: Ensure the compiler version matches the kernel build version to avoid unsupported options.

For more detailed instructions and additional information, visit the full article here.

The DLS 3.3.0 in-place upgrade process gets stuck at Step 3 due to a certificate mismatch between the DLS UI and the In-Place Upgrade Service.

Next Steps

  1. Access the upgrade page using the DLS IP (e.g, https://<DLS_IP>:8443).

  2. Use a browser with lower security settings (e.g., Firefox with Standard settings).

For more detailed instructions and additional information, visit the full article here.

Previous vGPU Configuration Issues
Next Performance and Functionality Issues
© Copyright © 2013-2025, NVIDIA Corporation. Last updated on Apr 29, 2025.