Glossary#
Key terms and concepts used throughout the NVIDIA AI Enterprise documentation.
- AI Enterprise#
NVIDIA AI Enterprise is a cloud-native suite of AI tools, libraries, and frameworks for production AI deployments, providing optimized performance, security, and enterprise-grade support.
- Base Command Manager (BCM)#
NVIDIA Base Command Manager is a cluster management platform that streamlines provisioning, workload management, and infrastructure monitoring for data centers.
- Cloud License Service (CLS)#
A cloud-hosted NVIDIA License System service instance that manages software licenses for NVIDIA AI Enterprise products without requiring on-premises infrastructure.
- Compute Instance#
In MIG (Multi-Instance GPU), a compute instance is a subdivision of a GPU instance that provides dedicated compute resources with isolated execution contexts.
- Container Toolkit#
The NVIDIA Container Toolkit (formerly nvidia-docker) enables GPU-accelerated Docker containers by providing a container runtime library and utilities.
- CUDA#
Compute Unified Device Architecture - NVIDIA’s parallel computing platform and programming model for GPU acceleration.
- Delegated License Service (DLS)#
An on-premises NVIDIA License System service instance hosted on a local network, providing license management without external connectivity requirements.
- Device Group#
An abstraction layer that automatically detects and presents sets of physically connected devices (GPUs, NICs) as a single logical unit for optimal topology-aware provisioning.
- DPU#
Data Processing Unit - A programmable processor designed to handle data center infrastructure tasks and accelerate networking, security, and storage operations.
- Fabric Manager#
NVIDIA Fabric Manager manages NVSwitch memory fabric and NVLink interconnects on NVIDIA HGX platforms, enabling multi-GPU configurations.
- GPU Instance#
In MIG mode, a GPU instance is a hardware-partitioned section of a physical GPU with dedicated memory, cache, and compute resources.
- GPU Operator#
The NVIDIA GPU Operator automates GPU management in Kubernetes, handling driver installation, runtime configuration, and GPU feature discovery.
- GPUDirect RDMA#
GPUDirect Remote Direct Memory Access - Technology enabling direct data exchange between GPUs and network devices or storage, bypassing CPU memory.
- GPUDirect Storage (GDS)#
Technology enabling direct data path between storage devices and GPU memory, avoiding CPU bounce buffers for improved bandwidth and latency.
- Heterogeneous vGPU#
Configuration allowing a single physical GPU to simultaneously support multiple vGPU profiles with different memory allocations (framebuffer sizes).
- HGX#
NVIDIA HGX is a GPU computing platform featuring multiple GPUs connected through NVSwitch, designed for AI training and large-scale computing workloads.
- Hypervisor#
Software that creates and manages virtual machines, enabling multiple operating systems to share a single hardware host. Examples: VMware vSphere, KVM.
- License System#
The NVIDIA License System manages software licenses for NVIDIA AI Enterprise, providing both cloud-hosted (CLS) and on-premises (DLS) licensing options.
- Live Migration#
The capability to transfer running VMs with NVIDIA vGPUs between physical hosts without downtime, enabling maintenance and load balancing.
- MIG (Multi-Instance GPU)#
Multi-Instance GPU technology allows hardware-level partitioning of a GPU into multiple isolated instances, each with dedicated resources.
- MIG-Backed vGPU#
A virtual GPU created from one or more MIG slices, providing spatial isolation with dedicated compute resources for multi-tenant workloads.
- Multi-vGPU#
Configuration allowing a single virtual machine to use multiple vGPUs simultaneously, aggregating computational power from several vGPU devices.
- NCCL#
NVIDIA Collective Communications Library - Optimized library for multi-GPU and multi-node communication in distributed training.
- NGC (NVIDIA GPU Cloud)#
NVIDIA’s catalog and registry for GPU-optimized software, including containers, models, and tools for AI and HPC applications.
- NGC CLI#
Command-line interface for accessing and downloading resources from the NVIDIA NGC Catalog, including drivers, containers, and models.
- NIM (NVIDIA Inference Microservices)#
Containerized inference services providing optimized deployment of AI models with standardized APIs for production environments.
- nvidia-smi#
NVIDIA System Management Interface - Command-line utility for monitoring and managing NVIDIA GPUs, displaying utilization, temperature, and configuration.
- NVLink#
High-bandwidth, direct GPU-to-GPU interconnect technology enabling fast communication between GPUs in the same system.
- NVLink Multicast#
Technology enabling efficient one-to-many data distribution across multiple GPUs connected through NVLink, optimizing distributed training.
- NVSwitch#
High-speed interconnect fabric providing full NVLink bandwidth between all GPUs in HGX systems, enabling optimal multi-GPU communication.
- Passthrough#
GPU passthrough assigns an entire physical GPU directly to a virtual machine, providing native performance without virtualization overhead.
- Peer-to-Peer (P2P)#
Capability allowing direct memory access between GPUs without CPU involvement, enabling fast inter-GPU communication over NVLink.
- Persistence Mode#
GPU configuration mode that keeps the NVIDIA kernel driver loaded even when no applications are running, reducing startup latency.
- SR-IOV#
Single Root I/O Virtualization - PCIe standard enabling a single physical device to present multiple virtual functions to different virtual machines.
- Suspend-Resume#
Feature allowing vGPU-configured VMs to be temporarily suspended and later resumed without losing operational state, optimizing resource utilization.
- TensorRT#
NVIDIA’s deep learning inference optimizer and runtime, providing high-performance inference acceleration with automatic optimization.
- Time-Sliced MIG-Backed vGPU#
Advanced vGPU mode combining MIG’s spatial partitioning with time-slicing within each MIG instance, maximizing density while maintaining isolation.
- Time-Sliced vGPU#
Virtual GPU configuration where multiple vGPUs share access to GPU resources through temporal scheduling, with round-robin time slicing.
- Unified Virtual Memory (UVM)#
Feature providing a single, cohesive memory address space accessible by both CPUs and GPUs, simplifying programming and improving performance.
- vGPU (Virtual GPU)#
A virtualized GPU instance that allows multiple virtual machines to share a single physical GPU while providing dedicated GPU capabilities.
- vGPU for Compute#
NVIDIA vGPU profile optimized for compute workloads like AI training and inference, providing full compute capabilities without graphics features.
- vGPU Guest Driver#
Driver installed in virtual machines that enables them to use virtualized GPU resources, providing the interface to access GPU capabilities.
- vGPU Profile#
A configuration template defining the framebuffer size, compute resources, and capabilities allocated to a vGPU instance.
- vGPU Type#
Specific vGPU configuration identified by name (for example, A100-4-20C), defining memory allocation and resource limits for a virtual GPU.
- Virtual GPU Manager#
Software component installed on the hypervisor that enables GPU virtualization, managing vGPU creation and resource allocation.
- VMI (Virtual Machine Instance/Image)#
Pre-configured virtual machine image with NVIDIA drivers and software pre-installed, available on cloud marketplaces for rapid deployment.
- VMware vSphere#
Enterprise virtualization platform from VMware, including ESXi hypervisor and vCenter management, supporting NVIDIA vGPU technology.
- VT-d / IOMMU#
Intel VT-d (AMD IOMMU) - Hardware virtualization technology enabling direct device assignment and DMA remapping for improved VM I/O performance.
—
Architecture-Specific Terms#
- Ampere Architecture#
NVIDIA GPU architecture featuring A100, A30, A40, and RTX A-series GPUs with MIG support and improved AI performance.
- Ada Lovelace Architecture#
NVIDIA GPU architecture featuring L4, L40, and RTX 6000 Ada GPUs with advanced ray tracing and AI capabilities.
- Blackwell Architecture#
Latest NVIDIA GPU architecture with B200 GPUs featuring enhanced AI capabilities and universal MIG technology supporting both compute and graphics.
- Hopper Architecture#
NVIDIA GPU architecture with H100, H200, H800 GPUs featuring Transformer Engine and advanced features for large-scale AI workloads.
- Volta Architecture#
Earlier NVIDIA GPU architecture featuring V100 GPUs, the first to support Tensor Cores for AI acceleration.
- Turing Architecture#
NVIDIA GPU architecture featuring T4 and Quadro RTX GPUs with first-generation ray tracing and AI capabilities.
—
Licensing Terms#
- CLS Instance#
Cloud License Service instance hosted on NVIDIA Licensing Portal, providing license management through the cloud.
- DLS Instance#
Delegated License Service instance hosted on-premises, providing local license management without cloud connectivity.
- Entitlement Certificate#
Document containing product activation keys and license information, provided after purchasing NVIDIA AI Enterprise.
- Feature Type#
License configuration parameter specifying the type of licensed software (for example, FeatureType=4 for vComputeServer).
- Floating License#
License that can be shared among multiple clients, checked out when needed and returned to the pool when no longer required.
- License Lease#
Temporary license assignment to a client, with automatic renewal and release mechanisms.
- Node-Locked License#
License tied to a specific physical machine or VM, not transferable to other systems.
—
Deployment Terms#
- Bare Metal Deployment#
Installation directly on physical servers without virtualization, providing maximum performance and direct GPU access.
- Cloud Deployment#
Deployment on public cloud platforms (AWS, Azure, GCP) using GPU-enabled virtual machine instances.
- Kubernetes Deployment#
Container orchestration deployment using Kubernetes with GPU Operator for automated GPU management.
- Multi-Node Deployment#
Distributed deployment across multiple physical servers for large-scale training and inference workloads.
- Virtualized Deployment#
Installation using hypervisors (VMware, KVM) with vGPU technology for GPU sharing across virtual machines.
—
Performance and Optimization#
- AMP (Automatic Mixed Precision)#
PyTorch/TensorFlow feature automatically using FP16 precision where appropriate while maintaining FP32 for stability.
- Batch Size#
Number of samples processed together in a single forward/backward pass during training or inference.
- DDP (Distributed Data Parallel)#
PyTorch’s multi-GPU training approach distributing data across GPUs with synchronized gradient updates.
- FP16#
16-bit floating-point precision, providing faster computation and lower memory usage with minimal accuracy loss.
- FP32#
32-bit floating-point precision, standard precision for neural network training providing high numerical accuracy.
- Gradient Accumulation#
Technique simulating larger batch sizes by accumulating gradients over multiple forward/backward passes before updating weights.
- Gradient Checkpointing#
Memory optimization technique trading computation for memory by recomputing activations during backward pass.
- INT8#
8-bit integer precision for inference, providing maximum throughput with minimal accuracy loss through quantization.
- Mixed Precision#
Training technique using both FP16 and FP32 precision to accelerate training while maintaining model accuracy.
- Tensor Core#
Specialized hardware units in NVIDIA GPUs designed for accelerating mixed-precision matrix operations in AI workloads.