Appendix#

Agentic AI : Systems capable of autonomous decision-making and action without continuous human intervention.

Ansible : An open-source automation tool for configuration management, application deployment, and task automation.

CRI-O : A lightweight container runtime specifically designed for Kubernetes.

CSI (Container Storage Interface) : A standard for exposing storage systems to containerized workloads on Kubernetes.

DCGM (Data Center GPU Manager) : A suite of tools for managing and monitoring GPU resources in data centers.

Helm : A package manager for Kubernetes that simplifies the deployment, management and upgrades of software applications. It uses helm charts, which are packages containing all the necessary resources for deploying an application.

IaC (Infrastructure as Code) : Managing and provisioning computing infrastructure through machine-readable scripts.

MLOps (Machine Learning Operations) : Practices for streamlining the deployment, monitoring, and governance of machine learning models.

NetQ : A network operations tool for monitoring and troubleshooting modern data centers.

NVIDIA GPU Operator : A Kubernetes operator that automates the management of NVIDIA GPU drivers and software components.

NVIDIA Network Operator : Manages networking components like SR-IOV and Multus in Kubernetes clusters.

NVIDIA AI Enterprise : A suite of software tools optimized for AI development and deployment on NVIDIA infrastructure.

NVIDIA AI Factory : NVIDIA’s vision for building complete AI infrastructure systems, integrating compute, storage, and networking.

NVIDIA Blueprints : A set of validated reference designs by NVIDIA to accelerate building and deploying AI factories.

NVIDIA Blackwell : The next-generation AI GPU architecture after Hopper, aimed at enabling trillion-parameter AI models.

NVIDIA NeMo Retriever : A tool in the NeMo framework for implementing Retrieval-Augmented Generation (RAG) pipelines.

NVIDIA NIM (NVIDIA Inference Microservices) : Prebuilt, optimized containers for serving AI models efficiently.

NVIDIA Spectrum-X : A networking platform for building AI clouds and supercomputers with Ethernet networking.

RAG (Retrieval-Augmented Generation) : A method of enhancing language models by retrieving external knowledge during response generation.

Vector Database : Specialized databases optimized for handling and searching high-dimensional vector data, essential for AI and recommendation systems.