Glossary#

Key terms and concepts used throughout the NVIDIA AI Enterprise documentation.

AI Enterprise#

NVIDIA AI Enterprise is a cloud-native suite of AI tools, libraries, and frameworks for production AI deployments, providing optimized performance, security, and enterprise-grade support.

Base Command Manager (BCM)#

NVIDIA Base Command Manager is a cluster management platform that streamlines provisioning, workload management, and infrastructure monitoring for data centers.

Cloud License Service (CLS)#

A cloud-hosted NVIDIA License System service instance that manages software licenses for NVIDIA AI Enterprise products without requiring on-premises infrastructure.

Compute Instance#

In MIG (Multi-Instance GPU), a compute instance is a subdivision of a GPU instance that provides dedicated compute resources with isolated execution contexts.

Container Toolkit#

The NVIDIA Container Toolkit (formerly nvidia-docker) enables GPU-accelerated Docker containers by providing a container runtime library and utilities.

CUDA#

Compute Unified Device Architecture - NVIDIA’s parallel computing platform and programming model for GPU acceleration.

Delegated License Service (DLS)#

An on-premises NVIDIA License System service instance hosted on a local network, providing license management without external connectivity requirements.

Device Group#

An abstraction layer that automatically detects and presents sets of physically connected devices (GPUs, NICs) as a single logical unit for optimal topology-aware provisioning.

DPU#

Data Processing Unit - A programmable processor designed to handle data center infrastructure tasks and accelerate networking, security, and storage operations.

Fabric Manager#

NVIDIA Fabric Manager manages NVSwitch memory fabric and NVLink interconnects on NVIDIA HGX platforms, enabling multi-GPU configurations.

GPU Instance#

In MIG mode, a GPU instance is a hardware-partitioned section of a physical GPU with dedicated memory, cache, and compute resources.

GPU Operator#

The NVIDIA GPU Operator automates GPU management in Kubernetes, handling driver installation, runtime configuration, and GPU feature discovery.

GPUDirect RDMA#

GPUDirect Remote Direct Memory Access - Technology enabling direct data exchange between GPUs and network devices or storage, bypassing CPU memory.

GPUDirect Storage (GDS)#

Technology enabling direct data path between storage devices and GPU memory, avoiding CPU bounce buffers for improved bandwidth and latency.

Heterogeneous vGPU#

Configuration allowing a single physical GPU to simultaneously support multiple vGPU profiles with different memory allocations (framebuffer sizes).

HGX#

NVIDIA HGX is a GPU computing platform featuring multiple GPUs connected through NVSwitch, designed for AI training and large-scale computing workloads.

Hypervisor#

Software that creates and manages virtual machines, enabling multiple operating systems to share a single hardware host. Examples: VMware vSphere, KVM.

License System#

The NVIDIA License System manages software licenses for NVIDIA AI Enterprise, providing both cloud-hosted (CLS) and on-premises (DLS) licensing options.

Live Migration#

The capability to transfer running VMs with NVIDIA vGPUs between physical hosts without downtime, enabling maintenance and load balancing.

MIG (Multi-Instance GPU)#

Multi-Instance GPU technology allows hardware-level partitioning of a GPU into multiple isolated instances, each with dedicated resources.

MIG-Backed vGPU#

A virtual GPU created from one or more MIG slices, providing spatial isolation with dedicated compute resources for multi-tenant workloads.

Multi-vGPU#

Configuration allowing a single virtual machine to use multiple vGPUs simultaneously, aggregating computational power from several vGPU devices.

NCCL#

NVIDIA Collective Communications Library - Optimized library for multi-GPU and multi-node communication in distributed training.

NGC (NVIDIA GPU Cloud)#

NVIDIA’s catalog and registry for GPU-optimized software, including containers, models, and tools for AI and HPC applications.

NGC CLI#

Command-line interface for accessing and downloading resources from the NVIDIA NGC Catalog, including drivers, containers, and models.

NIM (NVIDIA Inference Microservices)#

Containerized inference services providing optimized deployment of AI models with standardized APIs for production environments.

nvidia-smi#

NVIDIA System Management Interface - Command-line utility for monitoring and managing NVIDIA GPUs, displaying utilization, temperature, and configuration.

High-bandwidth, direct GPU-to-GPU interconnect technology enabling fast communication between GPUs in the same system.

Technology enabling efficient one-to-many data distribution across multiple GPUs connected through NVLink, optimizing distributed training.

NVSwitch#

High-speed interconnect fabric providing full NVLink bandwidth between all GPUs in HGX systems, enabling optimal multi-GPU communication.

Passthrough#

GPU passthrough assigns an entire physical GPU directly to a virtual machine, providing native performance without virtualization overhead.

Peer-to-Peer (P2P)#

Capability allowing direct memory access between GPUs without CPU involvement, enabling fast inter-GPU communication over NVLink.

Persistence Mode#

GPU configuration mode that keeps the NVIDIA kernel driver loaded even when no applications are running, reducing startup latency.

SR-IOV#

Single Root I/O Virtualization - PCIe standard enabling a single physical device to present multiple virtual functions to different virtual machines.

Suspend-Resume#

Feature allowing vGPU-configured VMs to be temporarily suspended and later resumed without losing operational state, optimizing resource utilization.

TensorRT#

NVIDIA’s deep learning inference optimizer and runtime, providing high-performance inference acceleration with automatic optimization.

Time-Sliced MIG-Backed vGPU#

Advanced vGPU mode combining MIG’s spatial partitioning with time-slicing within each MIG instance, maximizing density while maintaining isolation.

Time-Sliced vGPU#

Virtual GPU configuration where multiple vGPUs share access to GPU resources through temporal scheduling, with round-robin time slicing.

Unified Virtual Memory (UVM)#

Feature providing a single, cohesive memory address space accessible by both CPUs and GPUs, simplifying programming and improving performance.

vGPU (Virtual GPU)#

A virtualized GPU instance that allows multiple virtual machines to share a single physical GPU while providing dedicated GPU capabilities.

vGPU for Compute#

NVIDIA vGPU profile optimized for compute workloads like AI training and inference, providing full compute capabilities without graphics features.

vGPU Guest Driver#

Driver installed in virtual machines that enables them to use virtualized GPU resources, providing the interface to access GPU capabilities.

vGPU Profile#

A configuration template defining the framebuffer size, compute resources, and capabilities allocated to a vGPU instance.

vGPU Type#

Specific vGPU configuration identified by name (e.g., A100-4-20C), defining memory allocation and resource limits for a virtual GPU.

Virtual GPU Manager#

Software component installed on the hypervisor that enables GPU virtualization, managing vGPU creation and resource allocation.

VMI (Virtual Machine Instance/Image)#

Pre-configured virtual machine image with NVIDIA drivers and software pre-installed, available on cloud marketplaces for rapid deployment.

VMware vSphere#

Enterprise virtualization platform from VMware, including ESXi hypervisor and vCenter management, supporting NVIDIA vGPU technology.

VT-d / IOMMU#

Intel VT-d (AMD IOMMU) - Hardware virtualization technology enabling direct device assignment and DMA remapping for improved VM I/O performance.

Architecture-Specific Terms#

Ampere Architecture#

NVIDIA GPU architecture featuring A100, A30, A40, and RTX A-series GPUs with MIG support and improved AI performance.

Ada Lovelace Architecture#

NVIDIA GPU architecture featuring L4, L40, and RTX 6000 Ada GPUs with advanced ray tracing and AI capabilities.

Blackwell Architecture#

Latest NVIDIA GPU architecture with B200 GPUs featuring enhanced AI capabilities and universal MIG technology supporting both compute and graphics.

Hopper Architecture#

NVIDIA GPU architecture with H100, H200, H800 GPUs featuring Transformer Engine and advanced features for large-scale AI workloads.

Volta Architecture#

Earlier NVIDIA GPU architecture featuring V100 GPUs, the first to support Tensor Cores for AI acceleration.

Turing Architecture#

NVIDIA GPU architecture featuring T4 and Quadro RTX GPUs with first-generation ray tracing and AI capabilities.

Licensing Terms#

CLS Instance#

Cloud License Service instance hosted on NVIDIA Licensing Portal, providing license management through the cloud.

DLS Instance#

Delegated License Service instance hosted on-premises, providing local license management without cloud connectivity.

Entitlement Certificate#

Document containing product activation keys and license information, provided after purchasing NVIDIA AI Enterprise.

Feature Type#

License configuration parameter specifying the type of licensed software (e.g., FeatureType=4 for vComputeServer).

Floating License#

License that can be shared among multiple clients, checked out when needed and returned to the pool when no longer required.

License Lease#

Temporary license assignment to a client, with automatic renewal and release mechanisms.

Node-Locked License#

License tied to a specific physical machine or VM, not transferable to other systems.

Deployment Terms#

Bare Metal Deployment#

Installation directly on physical servers without virtualization, providing maximum performance and direct GPU access.

Cloud Deployment#

Deployment on public cloud platforms (AWS, Azure, GCP) using GPU-enabled virtual machine instances.

Kubernetes Deployment#

Container orchestration deployment using Kubernetes with GPU Operator for automated GPU management.

Multi-Node Deployment#

Distributed deployment across multiple physical servers for large-scale training and inference workloads.

Virtualized Deployment#

Installation using hypervisors (VMware, KVM) with vGPU technology for GPU sharing across virtual machines.

Performance and Optimization#

AMP (Automatic Mixed Precision)#

PyTorch/TensorFlow feature automatically using FP16 precision where appropriate while maintaining FP32 for stability.

Batch Size#

Number of samples processed together in a single forward/backward pass during training or inference.

DDP (Distributed Data Parallel)#

PyTorch’s multi-GPU training approach distributing data across GPUs with synchronized gradient updates.

FP16#

16-bit floating-point precision, providing faster computation and lower memory usage with minimal accuracy loss.

FP32#

32-bit floating-point precision, standard precision for neural network training providing high numerical accuracy.

Gradient Accumulation#

Technique simulating larger batch sizes by accumulating gradients over multiple forward/backward passes before updating weights.

Gradient Checkpointing#

Memory optimization technique trading computation for memory by recomputing activations during backward pass.

INT8#

8-bit integer precision for inference, providing maximum throughput with minimal accuracy loss through quantization.

Mixed Precision#

Training technique using both FP16 and FP32 precision to accelerate training while maintaining model accuracy.

Tensor Core#

Specialized hardware units in NVIDIA GPUs designed for accelerating mixed-precision matrix operations in AI workloads.