Glossary#

Key terms and concepts used throughout the NVIDIA AI Enterprise documentation.

AI Enterprise#

Cloud-native suite of AI tools, libraries, and frameworks for production deployments.

Base Command Manager (BCM)#

Cluster management platform for provisioning, workload management, and infrastructure monitoring in data centers.

Cloud License Service (CLS)#

Cloud-hosted NVIDIA License System instance that issues licenses for AI Enterprise products without on-premises license servers.

Compute Instance#

In MIG, a subdivision of a GPU instance with dedicated compute resources and an isolated execution context.

Container Toolkit#

Runtime library and utilities for GPU-accelerated Docker containers (formerly nvidia-docker).

CUDA#

Compute Unified Device Architecture - NVIDIA’s parallel computing platform and programming model for GPU acceleration.

Delegated License Service (DLS)#

On-premises NVIDIA License System instance on the local network for license management without requiring ongoing external connectivity.

Device Group#

Abstraction that groups physically connected devices (GPUs, NICs) into one logical unit for topology-aware provisioning.

DPU#

Data Processing Unit - A programmable processor for data center infrastructure tasks and accelerated networking, security, and storage.

Fabric Manager#

Manages NVSwitch memory fabric and NVLink interconnects on HGX platforms for multi-GPU configurations.

GPU Instance#

In MIG mode, a hardware-partitioned section of a physical GPU with dedicated memory, cache, and compute resources.

GPU Operator#

Automates GPU management in Kubernetes: driver installation, runtime configuration, and GPU feature discovery.

GPUDirect RDMA#

GPUDirect Remote Direct Memory Access - Direct data exchange between GPUs and network or storage devices without routing through CPU memory.

GPUDirect Storage (GDS)#

Direct data path between storage and GPU memory without CPU bounce buffers.

Heterogeneous vGPU#

Configuration where one physical GPU runs multiple vGPU profiles with different framebuffer sizes at the same time.

HGX#

Multi-GPU computing platform using NVSwitch for AI training and large-scale compute.

Hypervisor#

Software that creates and manages virtual machines, enabling multiple operating systems to share one hardware host. Examples: VMware vSphere, KVM.

License System#

Issues and tracks licenses for NVIDIA AI Enterprise through CLS (cloud) or DLS (on-premises).

Live Migration#

Moving running VMs that use NVIDIA vGPUs between hosts without shutting them down.

MIG (Multi-Instance GPU)#

Hardware-level partitioning of a GPU into isolated instances, each with dedicated resources.

MIG-Backed vGPU#

Virtual GPU built from one or more MIG slices; spatial isolation with dedicated compute for multi-tenant workloads.

Multi-vGPU#

One VM using multiple vGPUs to combine compute from several virtual GPU devices.

NCCL#

NVIDIA Collective Communications Library - Multi-GPU and multi-node collective communication for distributed training.

NGC (NVIDIA GPU Cloud)#

Catalog and registry of containers, models, and tools for AI and HPC workloads.

NGC CLI#

Command-line tool to access and download resources from the NGC Catalog (drivers, containers, models).

NIM (NVIDIA Inference Microservices)#

Containerized inference services exposing models through standardized APIs.

NVIDIA AI Enterprise Infra Collection#

Curated NGC collection of infrastructure components (GPU Operator, Network Operator, vGPU drivers) for AI on enterprise platforms.

NVIDIA Licensing System#

Centralized license management for NVIDIA AI Enterprise software, including vGPU entitlements. Delivered as CLS (cloud) or DLS (on-premises).

nvidia-smi#

NVIDIA System Management Interface - Command-line utility for GPU monitoring and management (utilization, temperature, configuration).

High-bandwidth GPU-to-GPU interconnect for fast on-box communication.

One-to-many data distribution across NVLink-connected GPUs, used in distributed training.

NVSwitch#

Interconnect fabric that provides full NVLink bandwidth between all GPUs in an HGX system.

Passthrough#

Assigns a whole physical GPU to one VM for native-like performance without GPU virtualization sharing.

Peer-to-Peer (P2P)#

Direct GPU-to-GPU memory access without the CPU, typically over NVLink.

Persistence Mode#

Keeps the NVIDIA kernel driver loaded when no apps use the GPU, reducing later startup latency.

SR-IOV#

Single Root I/O Virtualization - PCIe standard for one physical device to expose multiple virtual functions to different VMs.

Suspend-Resume#

Suspend a vGPU VM and resume it later without losing operational state.

TensorRT#

Inference optimizer and runtime for deep learning models.

Time-Sliced MIG-Backed vGPU#

Combines MIG spatial partitions with time-slicing inside each MIG instance for higher density and isolation.

Time-Sliced vGPU#

Multiple vGPUs share one GPU through time-based scheduling (round-robin).

Unified Virtual Memory (UVM)#

Single virtual address space visible to both CPU and GPU.

vGPU (Virtual GPU)#

Virtualized GPU presented to a VM so several VMs can share one physical GPU with isolated allocations.

vGPU for Compute#

vGPU profile for training and inference compute workloads without a full graphics stack.

vGPU Guest Driver#

Driver inside each guest OS that presents the vGPU to applications with behavior close to a passthrough GPU.

vGPU Manager#

Hypervisor-side software that partitions a physical GPU into vGPU devices and assigns them to VMs.

vGPU Profile#

Template for framebuffer size, compute resources, and capabilities of a vGPU instance.

vGPU Type#

Named vGPU configuration (for example A100-4-20C) that sets memory and resource limits.

Virtual GPU Manager#

Hypervisor component that creates vGPUs and allocates GPU resources to VMs.

VMI (Virtual Machine Instance/Image)#

Pre-built VM image with NVIDIA drivers and software for cloud marketplaces.

VMware vSphere#

VMware virtualization stack (ESXi, vCenter) that supports NVIDIA vGPU.

VT-d / IOMMU#

Intel VT-d (AMD IOMMU) - Hardware I/O virtualization for device assignment and DMA remapping in VMs.

Architecture-Specific Terms#

Ampere Architecture#

GPU architecture for A100, A100X, A40, A30, A30X, A16, A10, A10G, A10M, A2, RTX A6000, RTX A5000, and RTX A4000 with MIG support on A100 and A30.

Ada Lovelace Architecture#

GPU architecture for L4, L40, L40S, RTX 6000 Ada, RTX 5880 Ada, RTX 5000 Ada, and RTX 4000 SFF Ada with ray tracing and AI acceleration.

Blackwell Architecture#

GPU architecture for B200, DGX B200, HGX B200, RTX Pro 6000 Blackwell Server Edition, and RTX Pro 4500 Blackwell Server Edition. The Blackwell Ultra variant includes B300, DGX B300, and HGX B300. Grace Blackwell covers GB200 NVL4/NVL72 and GB300 NVL72 systems.

Hopper Architecture#

GPU architecture for H100, H200, H800, and H800 NVL with Transformer Engine and features for large-scale AI.

Volta Architecture#

Earlier GPU architecture (V100) and first with Tensor Cores for AI.

Turing Architecture#

GPU architecture for T4 and T4G with first-generation ray tracing and Tensor Cores.

Licensing Terms#

CLS Instance#

Cloud License Service instance on the NVIDIA Licensing Portal.

DLS Instance#

On-premises Delegated License Service instance for local license management.

Entitlement Certificate#

Document with activation keys and license details issued after purchase.

Feature Type#

License parameter for the licensed product (for example FeatureType=4 for vComputeServer).

Floating License#

License shared by many clients; checked out when in use and returned to the pool when idle.

License Lease#

Time-bounded license assignment to a client with renewal and release behavior defined by the server.

Node-Locked License#

License bound to one physical machine or VM and not moved to other systems.

Deployment Terms#

Bare Metal Deployment#

Software installed directly on physical servers without a hypervisor; GPUs are used without VM abstraction.

Cloud Deployment#

Workloads on public cloud GPU VM offerings (for example AWS, Azure, GCP).

Kubernetes Deployment#

Containers orchestrated by Kubernetes, often with GPU Operator for GPU setup.

Multi-Node Deployment#

Workloads spread across multiple physical servers for scale-out training or inference.

Virtualized Deployment#

Hypervisor-based VMs (VMware, KVM) sharing GPUs through vGPU.

Performance and Optimization#

AMP (Automatic Mixed Precision)#

PyTorch/TensorFlow training mode that uses FP16 where safe and FP32 where needed for stability.

Batch Size#

Number of samples processed together in one forward or backward pass.

DDP (Distributed Data Parallel)#

PyTorch pattern that shards data across GPUs and synchronizes gradients each step.

FP16#

16-bit floating-point format for faster math and lower memory use than FP32.

FP32#

32-bit floating-point format; common default for full-precision training.

Gradient Accumulation#

Runs several mini-batches before each optimizer step to mimic a larger batch size.

Gradient Checkpointing#

Trades extra forward recomputation for lower activation memory during backpropagation.

INT8#

8-bit integer format for quantized inference with higher throughput than floating point.

Mixed Precision#

Training that mixes low- and full-precision floats (often FP16 and FP32) to speed training while controlling numerical error.

Tensor Core#

GPU hardware blocks that accelerate mixed-precision matrix multiply-accumulate for AI workloads.