Glossary#
List of commonly used acronyms and terms:
| Term | Definition | 
|---|---|
| Audio2Face-2D | Audio2Face-2D (A2F-2D) is a generative model that converts audio input into life-like 2D mouth movement animations of a provided 2D portrait photo. | 
| Audio2Face-3D | Audio2Face-3D (A2F-3D) is a generative AI technology that converts audio input into life-like 3D facial animations, including emotional expressions. | 
| Animation Graph | NVIDIA Animation Graph is a runtime framework within NVIDIA’s Omniverse platform for skeletal animation blending, playback, and control. | 
| ASR | Automatic speech recognition (ASR), or speech-to-text, is the combination of processes and software that decode human speech and convert it to digitized text. | 
| Audio2Emotion | NVIDIA Audio2Emotion is a component of NVIDIA’s Audio2Face-3D technology that uses AI to detect emotional state from voice and adjust the facial animation of a 3D character accordingly. | 
| AWS | Amazon Web Services (AWS) is a comprehensive cloud computing platform offering IaaS, PaaS, and SaaS solutions on a pay-as-you-go basis. It provides a wide range of services including compute power, database storage, and content delivery. | 
| Azure | Microsoft Azure is Microsoft’s public cloud computing platform offering IaaS, PaaS, SaaS, and serverless functions. It supports various technologies and operates on a pay-as-you-go basis. | 
| Bare-metal | A physical server that is dedicated entirely to a single user or tenant, providing direct access to the server’s hardware without any virtualization layer. | 
| CSP | Cloud Service Provider is a company that offers various components of cloud computing services, enabling businesses and individuals to access and utilize computing resources over the internet. Example CSPs: 
 | 
| Digital Humans | Digital Humans are AI-powered virtual representations of humans that combine computer graphics, computer vision, and artificial intelligence to create highly realistic and interactive virtual characters. | 
| GCP | Google Cloud Platform (GCP) is a suite of cloud computing services providing tools for building, deploying, and managing applications. It includes services like computing power, storage, databases, and machine learning, running on Google’s global infrastructure. | 
| Guardrails | Guardrails in the context of Large Language Models (LLMs) are critical safety measures designed to monitor, control, and ensure the safe and responsible operation of these powerful AI systems. | 
| Helm | A package manager for Kubernetes that simplifies the process of installing and managing Kubernetes applications by using pre-configured packages called Helm charts. | 
| Kubernetes | A widely used container orchestration platform designed to automate the deployment, scaling and management of containerized applications. | 
| LLM | Large Language Models (LLMs) are deep learning algorithms that can recognize, summarize, translate, predict, and generate content using very large datasets. | 
| NAT | Network Address Translation(NAT) is a technique used to manage and conserve IP addresses within a network, particularly in the context of the limited number of available IPv4 addresses. | 
| NeMo | NVIDIA NeMo is an end-to-end platform for developing custom generative AI—including large language models (LLMs), vision language models (VLMs), video models, and speech AI—anywhere. | 
| NGC Catalog | NGC (NVIDIA GPU Cloud) Catalog is a curated collection of GPU-optimized software, including containers, pre-trained models, Helm charts for Kubernetes deployments and industry-specific AI toolkits with software development kits (SDKs). | 
| NLP | Natural language processing (NLP) is the application of AI to process and analyze text or voice data in order to understand, interpret, categorize, and/or derive insights from the content. | 
| NVIDIA ACE | NVIDIA ACE (Avatar Cloud Engine) is a comprehensive suite of technologies and tools designed to bring digital humans to life using generative AI. | 
| NVIDIA Blueprint | NVIDIA Blueprints are predefined, pretrained AI workflows designed to simplify and accelerate the development of various AI applications. | 
| NVIDIA GPU | NVIDIA Graphics Processing Unit (GPU) is a specialized electronic circuit designed for graphics rendering and high-speed mathematical calculations, used in gaming, professional graphics, AI, and high-performance computing. | 
| NVIDIA Maxine | NVIDIA Maxine is a suite of GPU-accelerated SDKs and NIM Microservices that enhance audio, video, and augmented reality effects for real-time communications, including features like noise reduction, video upscaling, and eye-gaze correction. | 
| NVIDIA NIM | NVIDIA Inference Microservices (NIM) is a set of easy-to-use microservices designed to accelerate the deployment of generative AI models across cloud, data center, and workstation environments. NIM provides pre-optimized inference engines, such as TensorRT and TensorRT-LLM, to run AI models on NVIDIA GPUs, ensuring low-latency and high-throughput performance. | 
| NVIDIA Omniverse | It is a platform of APIs, SDKs, and services that enable developers to integrate OpenUSD, NVIDIA RTX™ rendering technologies, and generative physical AI into existing software tools and simulation workflows for industrial and robotic use cases. | 
| NVIDIA UCS | NVIDIA UCS (Unified Cloud Services) is a low-code framework designed for developing cloud-native, real-time, and multimodal AI applications. It adopts a microservices architecture, allowing developers to combine microservices into cloud-native applications or services | 
| RAG | Retrieval-Augmented Generation (RAG), a generative AI architecture that combines Large Language Models (LLMs) with a data retrieval component to generate accurate and up-to-date responses. It retrieves relevant information from external knowledge bases and uses this data to inform the generated output. This approach enhances the reliability and accuracy of LLMs. | 
| Riva | NVIDIA Riva is an AI speech SDK that provides a set of tools for building conversational AI applications, including speech recognition, text-to-speech, and natural language processing capabilities, optimized for NVIDIA GPUs. | 
| RTSP | Real-time streaming protocol (RTSP) is a network protocol that controls how the streaming of a media should occur between a server and a client. | 
| SDR | Stream Distribution & Routing(SDR) provides a way to distribute media streams to the individual pods and is responsible for the routing and stream state management. | 
| STUN | A STUN (Session Traversal Utilities for NAT) server is a type of server used in VoIP (Voice over Internet Protocol) and other real-time communication systems to help clients behind firewalls or NAT (Network Address Translation) devices to connect with other clients. | 
| TTS | Text-to-speech is a form of speech synthesis that converts any string of text characters into spoken output. | 
| TURN | Traversal Using Relays around NAT(TURN) is a network protocol and server technology designed to facilitate communication between devices that are behind Network Address Translation (NAT) systems or firewalls, where direct peer-to-peer connections are not possible. | 
| Voice Font | Refers to NVIDIA Voice Font Microservice. This feature converts the speaker’s timbre from input audio to that of the reference audio, while retaining the linguistic content and prosody from the input. | 
| UMIM | Unified Multimodal Interaction Management (UMIM) provides an interaction level interface between the interaction manager (IM) - a decision making unit - and an interactive system executing the commands from the IM. | 
| VST | NVIDIA Video Storage Toolkit (VST), also referred to as VMS (Video Management System) manages audio and video streams and provides on demand access to offline streams from storage. It accepts a WebRTC stream from the front end UI application and outputs RTSP streams for further processing. |