Is this page helpful?

Glossary#

Key terms and concepts used throughout the NVIDIA AI Enterprise documentation.

AI Enterprise#: NVIDIA AI Enterprise is a cloud-native suite of AI tools, libraries, and frameworks for production AI deployments, providing optimized performance, security, and enterprise-grade support.
Base Command Manager (BCM)#: NVIDIA Base Command Manager is a cluster management platform that streamlines provisioning, workload management, and infrastructure monitoring for data centers.
Cloud License Service (CLS)#: A cloud-hosted NVIDIA License System service instance that manages software licenses for NVIDIA AI Enterprise products without requiring on-premises infrastructure.
Compute Instance#: In MIG (Multi-Instance GPU), a compute instance is a subdivision of a GPU instance that provides dedicated compute resources with isolated execution contexts.
Container Toolkit#: The NVIDIA Container Toolkit (formerly nvidia-docker) enables GPU-accelerated Docker containers by providing a container runtime library and utilities.
CUDA#: Compute Unified Device Architecture - NVIDIA’s parallel computing platform and programming model for GPU acceleration.
Delegated License Service (DLS)#: An on-premises NVIDIA License System service instance hosted on a local network, providing license management without external connectivity requirements.
Device Group#: An abstraction layer that automatically detects and presents sets of physically connected devices (GPUs, NICs) as a single logical unit for optimal topology-aware provisioning.
DPU#: Data Processing Unit - A programmable processor designed to handle data center infrastructure tasks and accelerate networking, security, and storage operations.
Fabric Manager#: NVIDIA Fabric Manager manages NVSwitch memory fabric and NVLink interconnects on NVIDIA HGX platforms, enabling multi-GPU configurations.
GPU Instance#: In MIG mode, a GPU instance is a hardware-partitioned section of a physical GPU with dedicated memory, cache, and compute resources.
GPU Operator#: The NVIDIA GPU Operator automates GPU management in Kubernetes, handling driver installation, runtime configuration, and GPU feature discovery.
GPUDirect RDMA#: GPUDirect Remote Direct Memory Access - Technology enabling direct data exchange between GPUs and network devices or storage, bypassing CPU memory.
GPUDirect Storage (GDS)#: Technology enabling direct data path between storage devices and GPU memory, avoiding CPU bounce buffers for improved bandwidth and latency.
Heterogeneous vGPU#: Configuration allowing a single physical GPU to simultaneously support multiple vGPU profiles with different memory allocations (framebuffer sizes).
HGX#: NVIDIA HGX is a GPU computing platform featuring multiple GPUs connected through NVSwitch, designed for AI training and large-scale computing workloads.
Hypervisor#: Software that creates and manages virtual machines, enabling multiple operating systems to share a single hardware host. Examples: VMware vSphere, KVM.
License System#: The NVIDIA License System manages software licenses for NVIDIA AI Enterprise, providing both cloud-hosted (CLS) and on-premises (DLS) licensing options.
Live Migration#: The capability to transfer running VMs with NVIDIA vGPUs between physical hosts without downtime, enabling maintenance and load balancing.
MIG (Multi-Instance GPU)#: Multi-Instance GPU technology allows hardware-level partitioning of a GPU into multiple isolated instances, each with dedicated resources.
MIG-Backed vGPU#: A virtual GPU created from one or more MIG slices, providing spatial isolation with dedicated compute resources for multi-tenant workloads.
Multi-vGPU#: Configuration allowing a single virtual machine to use multiple vGPUs simultaneously, aggregating computational power from several vGPU devices.
NCCL#: NVIDIA Collective Communications Library - Optimized library for multi-GPU and multi-node communication in distributed training.
NGC (NVIDIA GPU Cloud)#: NVIDIA’s catalog and registry for GPU-optimized software, including containers, models, and tools for AI and HPC applications.
NGC CLI#: Command-line interface for accessing and downloading resources from the NVIDIA NGC Catalog, including drivers, containers, and models.
NIM (NVIDIA Inference Microservices)#: Containerized inference services providing optimized deployment of AI models with standardized APIs for production environments.
nvidia-smi#: NVIDIA System Management Interface - Command-line utility for monitoring and managing NVIDIA GPUs, displaying utilization, temperature, and configuration.
NVLink#: High-bandwidth, direct GPU-to-GPU interconnect technology enabling fast communication between GPUs in the same system.
NVLink Multicast#: Technology enabling efficient one-to-many data distribution across multiple GPUs connected through NVLink, optimizing distributed training.
NVSwitch#: High-speed interconnect fabric providing full NVLink bandwidth between all GPUs in HGX systems, enabling optimal multi-GPU communication.
Passthrough#: GPU passthrough assigns an entire physical GPU directly to a virtual machine, providing native performance without virtualization overhead.
Peer-to-Peer (P2P)#: Capability allowing direct memory access between GPUs without CPU involvement, enabling fast inter-GPU communication over NVLink.
Persistence Mode#: GPU configuration mode that keeps the NVIDIA kernel driver loaded even when no applications are running, reducing startup latency.
SR-IOV#: Single Root I/O Virtualization - PCIe standard enabling a single physical device to present multiple virtual functions to different virtual machines.
Suspend-Resume#: Feature allowing vGPU-configured VMs to be temporarily suspended and later resumed without losing operational state, optimizing resource utilization.
TensorRT#: NVIDIA’s deep learning inference optimizer and runtime, providing high-performance inference acceleration with automatic optimization.
Time-Sliced MIG-Backed vGPU#: Advanced vGPU mode combining MIG’s spatial partitioning with time-slicing within each MIG instance, maximizing density while maintaining isolation.
Time-Sliced vGPU#: Virtual GPU configuration where multiple vGPUs share access to GPU resources through temporal scheduling, with round-robin time slicing.
Unified Virtual Memory (UVM)#: Feature providing a single, cohesive memory address space accessible by both CPUs and GPUs, simplifying programming and improving performance.
vGPU (Virtual GPU)#: A virtualized GPU instance that allows multiple virtual machines to share a single physical GPU while providing dedicated GPU capabilities.
vGPU for Compute#: NVIDIA vGPU profile optimized for compute workloads like AI training and inference, providing full compute capabilities without graphics features.
vGPU Guest Driver#: Driver installed in virtual machines that enables them to use virtualized GPU resources, providing the interface to access GPU capabilities.
vGPU Profile#: A configuration template defining the framebuffer size, compute resources, and capabilities allocated to a vGPU instance.
vGPU Type#: Specific vGPU configuration identified by name (e.g., A100-4-20C), defining memory allocation and resource limits for a virtual GPU.
Virtual GPU Manager#: Software component installed on the hypervisor that enables GPU virtualization, managing vGPU creation and resource allocation.
VMI (Virtual Machine Instance/Image)#: Pre-configured virtual machine image with NVIDIA drivers and software pre-installed, available on cloud marketplaces for rapid deployment.
VMware vSphere#: Enterprise virtualization platform from VMware, including ESXi hypervisor and vCenter management, supporting NVIDIA vGPU technology.
VT-d / IOMMU#: Intel VT-d (AMD IOMMU) - Hardware virtualization technology enabling direct device assignment and DMA remapping for improved VM I/O performance.

—

Architecture-Specific Terms#

Ampere Architecture#: NVIDIA GPU architecture featuring A100, A30, A40, and RTX A-series GPUs with MIG support and improved AI performance.
Ada Lovelace Architecture#: NVIDIA GPU architecture featuring L4, L40, and RTX 6000 Ada GPUs with advanced ray tracing and AI capabilities.
Blackwell Architecture#: Latest NVIDIA GPU architecture with B200 GPUs featuring enhanced AI capabilities and universal MIG technology supporting both compute and graphics.
Hopper Architecture#: NVIDIA GPU architecture with H100, H200, H800 GPUs featuring Transformer Engine and advanced features for large-scale AI workloads.
Volta Architecture#: Earlier NVIDIA GPU architecture featuring V100 GPUs, the first to support Tensor Cores for AI acceleration.
Turing Architecture#: NVIDIA GPU architecture featuring T4 and Quadro RTX GPUs with first-generation ray tracing and AI capabilities.

—

Licensing Terms#

CLS Instance#: Cloud License Service instance hosted on NVIDIA Licensing Portal, providing license management through the cloud.
DLS Instance#: Delegated License Service instance hosted on-premises, providing local license management without cloud connectivity.
Entitlement Certificate#: Document containing product activation keys and license information, provided after purchasing NVIDIA AI Enterprise.
Feature Type#: License configuration parameter specifying the type of licensed software (e.g., FeatureType=4 for vComputeServer).
Floating License#: License that can be shared among multiple clients, checked out when needed and returned to the pool when no longer required.
License Lease#: Temporary license assignment to a client, with automatic renewal and release mechanisms.
Node-Locked License#: License tied to a specific physical machine or VM, not transferable to other systems.

—

Deployment Terms#

Bare Metal Deployment#: Installation directly on physical servers without virtualization, providing maximum performance and direct GPU access.
Cloud Deployment#: Deployment on public cloud platforms (AWS, Azure, GCP) using GPU-enabled virtual machine instances.
Kubernetes Deployment#: Container orchestration deployment using Kubernetes with GPU Operator for automated GPU management.
Multi-Node Deployment#: Distributed deployment across multiple physical servers for large-scale training and inference workloads.
Virtualized Deployment#: Installation using hypervisors (VMware, KVM) with vGPU technology for GPU sharing across virtual machines.

—

Performance and Optimization#

AMP (Automatic Mixed Precision)#: PyTorch/TensorFlow feature automatically using FP16 precision where appropriate while maintaining FP32 for stability.
Batch Size#: Number of samples processed together in a single forward/backward pass during training or inference.
DDP (Distributed Data Parallel)#: PyTorch’s multi-GPU training approach distributing data across GPUs with synchronized gradient updates.
FP16#: 16-bit floating-point precision, providing faster computation and lower memory usage with minimal accuracy loss.
FP32#: 32-bit floating-point precision, standard precision for neural network training providing high numerical accuracy.
Gradient Accumulation#: Technique simulating larger batch sizes by accumulating gradients over multiple forward/backward passes before updating weights.
Gradient Checkpointing#: Memory optimization technique trading computation for memory by recomputing activations during backward pass.
INT8#: 8-bit integer precision for inference, providing maximum throughput with minimal accuracy loss through quantization.
Mixed Precision#: Training technique using both FP16 and FP32 precision to accelerate training while maintaining model accuracy.
Tensor Core#: Specialized hardware units in NVIDIA GPUs designed for accelerating mixed-precision matrix operations in AI workloads.

Glossary#

Architecture-Specific Terms#

Licensing Terms#

Deployment Terms#

Related Concepts#

Performance and Optimization#