> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/dsx/llms.txt.
> For full documentation content, see https://docs.nvidia.com/dsx/llms-full.txt.

# NVIDIA Software Components

This section provides detailed information on NVIDIA-provided software
components that address the capabilities described in the [Software Reference Guide](/dsx/part-1-software-reference-guide/ncp-software-reference-guide). Each
component is mapped to the architectural layer it supports. The use of
NVIDIA software is optional and depends on architectural decisions made
by the NCP or ISV. NCPs can work with ecosystem partners to integrate
these components or implement alternative solutions.

This section is organized by functional area, mirroring the structure of
the Software Reference Architecture section:

* Infrastructure Platform:
  * Network Management – Software for managing Ethernet, InfiniBand, and
    NVLink fabrics
  * Compute Management – Software for bare metal lifecycle, GPU
    virtualization, and observability
  * Storage – Software for high-performance GPU-to-storage connectivity
* Container Platform – Software for GPU-accelerated containers and
  Kubernetes
* AI Platforms – Software for training and inference workload management

## Key Software Components

Key software components provided by NVIDIA are listed in the following
table.

**Key NVIDIA Software Components**

| Component                             | Description                                                                                                                                                                                                                                                                                                                                                                         |
| ------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Virtual GPU Software                  | Communicates with platform hardware to allocate GPU resources between host and guest                                                                                                                                                                                                                                                                                                |
| Fabric Manager                        | Programming NVSwitch for high-performance multi-GPU workloads                                                                                                                                                                                                                                                                                                                       |
| NVIDIA DOCA™ software                 | DOCA-OFED drivers and DOCA acceleration libraries and services to enable accelerated networking for AI workloads.                                                                                                                                                                                                                                                                   |
| NVIDIA Data Centre GPU Manager (DCGM) | DCGM provides GPU monitoring, diagnostics, and telemetry. Enables automated break-fix and infrastructure observability.                                                                                                                                                                                                                                                             |
| NVIDIA Infra Controller               | NVIDIA Infra Controller is NVIDIA's cloud-native bare metal provisioning platform that provides hardware lifecycle management, orchestrated by the DPU                                                                                                                                                                                                                              |
| Base Command Manager                  | Managing AI infrastructure through workload provisioning                                                                                                                                                                                                                                                                                                                            |
| Container Toolkit                     | Enables container runtimes to access GPU hardware within containers                                                                                                                                                                                                                                                                                                                 |
| NVIDIA K8s Operators                  | GPU Operator standardizes GPU management in K8s and enables better GPU performance, utilization, and telemetry. Network Operator simplifies the provisioning and management of NVIDIA networking resources in a K8s cluster. NIM Operator automates the lifecycle of NVIDIA NIM™ microservices for Generative AI applications. NVIDIA GPU Drivers that allow the GPU to run on K8s. |
| Run:ai                                | Optimizes workload deployment by leveraging K8s orchestration                                                                                                                                                                                                                                                                                                                       |
| NVIDIA Cloud Functions (NVCF)         | A serverless API that allows users to deploy and manage AI workloads on GPUs, providing scalability, security, and reliability, accessible via HTTP polling, streaming, or gRPC protocols. K8s integration can be achieved with the NVIDIA Cluster Agent (NVCA).                                                                                                                    |
| NVIDIA Inference Microservices (NIM)  | A set of easy-to-use microservices designed for secure, reliable deployment of high-performance AI model inferencing across clouds, data centers, and workstations.                                                                                                                                                                                                                 |
| NVIDIA NeMo™ microservices            | Provide end-to-end workflow for model customization, enabling enterprises to adapt large language models to their specific needs efficiently.                                                                                                                                                                                                                                       |

Additional software shown in the following table can be used for full
infrastructure management that includes the networking components. These
are detailed in [NVIDIA Software for Infrastructure as a Service](/dsx/part-2-software-components/nvidia-software-for-infrastructure-as-a-service).

**Additional NVIDIA Software for Infrastructure Components**

| Component                                                                                                                                                 | Layer                     | Function                                                                                        |
| --------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------- | ----------------------------------------------------------------------------------------------- |
| [Unified Fabric Manager (UFM)](https://docs.nvidia.com/networking/display/ufmenterpriseumv6190/installing+ufm+server+software)                            | Network Management        | Manages Quantum InfiniBand switches through MLNX-OX                                             |
| [NVIDIA User Experience (NVUE)](https://docs.nvidia.com/networking-ethernet-software/cumulus-linux-513/System-Configuration/NVIDIA-User-Experience-NVUE/) | Network Management        | Manages Spectrum Ethernet switches through Cumulus Linux                                        |
| [NetQ](https://docs.nvidia.com/networking-ethernet-software/cumulus-netq-414/NetQ-Overview/NetQ-Basics/NetQ-Components/)                                  | Monitoring and visibility | Provides network and host visibility                                                            |
| [NVIDIA Air](https://air.nvidia.com/)                                                                                                                     | Deployment validation     | Simulation environment that provides deployment validation                                      |
| [NMX](https://docs.nvidia.com/networking/software/nvlink-management-software/index.html)                                                                  | Network Management        | Manages NVSwitch-based NVLink interconnects. NMX has three components: NMX-C, NMX-M, and NMX-T. |

The following components complement NVIDIA software and are selected to
complete the stack. The following infrastructure components can be
provided by NCPs, ISVs, or the open-source ecosystem:

**Infrastructure Software Components**

| Component                      | Layer | Description                                                                                |
| ------------------------------ | ----- | ------------------------------------------------------------------------------------------ |
| Operating System               | IaaS  | Linux distribution for compute hosts                                                       |
| Hypervisor                     | IaaS  | Allocates physical host resources to guest virtual machines                                |
| Cloud Control Plane            | IaaS  | Tenant facing control plane providing API/UI to provision compute, networking, and storage |
| SDN controller                 | IaaS  | Network intent translation to hardware                                                     |
| Storage System                 | IaaS  | Block, file, object storage                                                                |
| Identity and Access Management | IaaS  | Tenant authentication and authorization                                                    |
| Kubernetes                     | CaaS  | Container orchestration platform                                                           |
| Slurm                          | CaaS  | HPC workload manager for job scheduling                                                    |
| PyTorch                        | CaaS  | GPU-accelerated tensor computational framework with a Python front end                     |
| AI Platform                    | SaaS  | Tenant-facing platform for training and inference workloads                                |