Infrastructure Layer Software#

Infrastructure Layer Components: The infrastructure layer includes GPU and networking drivers optimized for bare-metal and virtualized environments, Kubernetes operators for managing GPU and networking resources and the lifecycle of microservices and AI pipelines, and cluster management software for provisioning and monitoring servers at scale.

AI Infrastructure Management: These components ensure efficient deployment, management, and scaling of AI compute resources across bare-metal, virtualized, and containerized environments. Each infrastructure release provides versioned updates to maintain compatibility and stability, along with feature updates, security patches, and performance improvements. For application layer components, see Application Layer Software.

Release Distribution: Infrastructure software is distributed through the Infrastructure Branch, which provides regular updates with 1-year support windows and can be designated as a Long-Term Support Branch (LTSB) for extended 3-year support. For more information, see Lifecycle Policy.

Software Components#

NVIDIA AI Enterprise includes the following infrastructure software components. For application layer components, see Application Layer Software.

Table 2 Infrastructure Layer Software Components#

Component

Description

NGC Catalog

Documentation

NGC Catalog Resources

NVIDIA AI Enterprise

Lists all NVIDIA AI Enterprise supported software available on NGC, including AI frameworks, microservices, pre-trained models, SDKs, and tools with enterprise support.

AI Enterprise on NGC

NVIDIA Infra Collections

Lists all NVIDIA AI Enterprise Infrastructure collections available on NGC, including infrastructure software components such as drivers, operators, and management tools for deploying and managing AI workloads.

Infra Collections on NGC

Core Infrastructure Drivers

NVIDIA Data Center Driver

Provides hardware support for NVIDIA GPUs. Consult the appropriate NVIDIA AI Enterprise Release Notes to see which GPUs and operating systems are supported.

GPU Driver on NGC

Data Center Driver Documentation

NVIDIA DOCA Driver for Networking

Provides hardware support for NVIDIA BlueField DPUs and SuperNICs. Installing DOCA on the host provides all necessary drivers and tools to manage BlueField and ConnectX devices.

DOCA Driver on NGC

DOCA Drivers Documentation

NVIDIA Fabric Manager

Manages NVSwitch-based GPU-to-GPU interconnects in multi-GPU systems such as DGX and HGX platforms.

For vGPU: Fabric Manager on NGC

For Passthrough: Fabric Manager User Guide

Data Center Services

NVIDIA DOCA Microservices

Infrastructure acceleration and offload services for NVIDIA BlueField, enabling accelerated networking, storage, and security workloads.

DOCA Microservices on NGC

DOCA Documentation

Virtualization [1]

NVIDIA Virtual GPU Manager

GPU driver deployed in the hypervisor for virtualized environments. Enables multi-tenant GPU sharing, live migration, and monitoring.

vGPU Manager on NGC

vGPU for Compute Documentation

NVIDIA vGPU for Compute Guest Driver

GPU driver deployed in the VM or on bare metal OS to enable multiple VMs to have simultaneous, direct access to a single physical GPU.

vGPU for Compute Documentation

Container Platform

NVIDIA Container Toolkit

Enables GPU-accelerated containers by providing runtime components and utilities for container engines (Docker, containerd, CRI-O).

Container Toolkit on NGC

Container Toolkit Documentation

Kubernetes Operators

NVIDIA DPU Operator (DPF)

Enables cluster administrators to automate provisioning, orchestration, and lifecycle management of BlueField DPUs and DOCA Microservices to enable DPU-accelerated North-South networking in Kubernetes.

DPU Operator on NGC

DPU Operator Documentation

NVIDIA GPU Operator

Simplifies deployment of NVIDIA AI Enterprise by automating management of all NVIDIA software components needed to provision GPUs in Kubernetes.

GPU Operator on NGC

GPU Operator Documentation

NVIDIA Network Operator

Simplifies deployment of high-speed networking by automating management of NVIDIA ConnectX NICs and SuperNICs required to optimize East-West traffic and RDMA transfers in Kubernetes.

Network Operator on NGC

Network Operator Documentation

NVIDIA NIM Operator

Enables cluster administrators to operate the software components and services required to run LLM, embedding, and other models using NVIDIA NIM microservices in Kubernetes.

NIM Operator on NGC

NIM Operator Documentation

Supported Infrastructure Software Not on NGC#

Attention

In this context, Supported means NVIDIA accepts bug reports from customers with valid NVIDIA AI Enterprise subscriptions and will work to address reported issues. All supported software is covered by NVIDIA AI Enterprise Support Services. NVIDIA supports the individual software components listed below, but not glue code, third-party applications, or full deployment integration.

The following infrastructure software is supported for use with NVIDIA AI Enterprise but is not distributed through NGC. Some components require enterprise licensing or are tightly coupled with hardware support, while third-party infrastructure software is obtained directly from the respective vendors.

Table 3 Supported Infrastructure Software Not on NGC#

Component

Description

Access

NVIDIA Infrastructure Software

NVIDIA Base Command Manager (BCM)

Cluster management and provisioning tool for NVIDIA DGX systems.

NVIDIA Enterprise Support Portal

NVIDIA vGPU Software Licensing Server

Enables GPU virtualization for VDI and AI workloads.

NVIDIA Licensing Portal or partner channels

Networking Drivers and SDKs

NVIDIA DOCA Host Package (with DOCA-OFED Drivers)

DOCA-Host includes all needed host drivers and tools for NVIDIA BlueField and ConnectX devices.

NVIDIA Networking Support Portal

NVIDIA BlueField Software Bundle

BlueField software bundle (BF-Bundle) is installed on the BlueField Arm cores to provide a complete DOCA experience on the BlueField networking platform.

NVIDIA Networking Support Portal

NVIDIA BlueField Firmware Bundle

BlueField Firmware Bundle (BF-FWBundle) is a minimal software package installed on the BlueField Arm cores.

NVIDIA Networking Support Portal

Release Branches and Lifecycle#

Infrastructure software is distributed through the Infrastructure Branch, which provides regular updates with 1-year support windows and can be designated as a Long-Term Support Branch (LTSB) for extended 3-year support.

Infrastructure Support Matrix#

NVIDIA AI Enterprise is certified to run across public cloud, data centers, workstations, DGX platform, and edge environments. Use the support matrix to verify supported configurations for your deployment.

Important

Interactive Support Matrix

Explore the support matrix with interactive filtering and version comparison across releases 7.0-7.4:

Launch Interactive Tool

Interactive Features:

  • Compare configurations across multiple releases side-by-side

  • Filter by deployment type (bare metal, virtualized, cloud)

  • Progressive filtering guides you through OS, hypervisor, and orchestration options

  • Interactive footnotes with hover tooltips

  • Dynamic version badges show exactly which releases support each configuration

Infrastructure Release Notes#