Overview of NeMo Microservices#

NVIDIA NeMo is a modular, enterprise-ready software suite for managing the AI agent lifecycle, enabling enterprises to build, deploy, and optimize agentic systems.

NVIDIA NeMo microservices, part of the NVIDIA NeMo software suite, are an API-first modular set of tools that you can use to customize, evaluate, and secure large language models (LLMs) and embedding models while optimizing AI applications across on-premises or cloud-based Kubernetes clusters.

The NeMo microservices are categorized into the following two types: functional and platform component.

Functional Microservices#

The following are the functional microservices for LLMs and embedding models.

NVIDIA NeMo Customizer: Facilitates the fine-tuning of large language models (LLMs) and embedding models using full-supervised and parameter-efficient fine-tuning techniques.
NVIDIA NeMo Evaluator: Provides comprehensive evaluation capabilities for LLMs and embedding models, supporting academic benchmarks, custom automated evaluations, and LLM-as-a-Judge approaches.
NVIDIA NeMo Guardrails: Adds safety checks and content moderation to LLM endpoints, protecting against hallucinations, harmful content, and security vulnerabilities.
NVIDIA NeMo Data Designer: Design synthetic datasets from scratch or seed using AI models, statistical sampling, and configurable data schemas.
NVIDIA NeMo Safe Synthesizer (Early Access): Generates private versions of sensitive, tabular datasets for privacy compliance and data protection.
NVIDIA NeMo Auditor (Early Access): Audits models and agentic applications for security vulnerabilities and harmful content.

Infrastructure Microservices#

The following are the infrastructure microservices that form the infrastructure for the functional microservices.

NVIDIA NeMo Data Store: Serves as the default file storage solution for the NeMo microservices platform, exposing APIs compatible with the Hugging Face Hub client (HfApi).
NVIDIA NeMo Entity Store: Provides tools to manage and organize general entities such as namespaces, projects, datasets, and models.
NVIDIA NeMo Deployment Management: Provides an API to deploy NIM on a Kubernetes cluster and manage them through the NIM Operator microservice.
NVIDIA NeMo NIM Proxy: Provides a unified endpoint that you can use to access all deployed NIM for inference tasks.
NVIDIA NeMo Operator: Manages custom resource definitions (CRDs) for NeMo Customizer fine-tuning jobs.
NVIDIA NeMo Core: Manages the backend infrastructure for the functional microservices such as scheduling jobs.

Web Interface#

The following web interface provides a visual way to interact with the NeMo microservices platform.

NVIDIA NeMo Studio (Early Access): A web-based user interface for managing AI development workflows including projects, datasets, model customization, evaluation, and interactive model testing.

Target Users#

This documentation primarily serves two types of users:

Developers and Researchers: Install and evaluate the NeMo microservices platform or individual microservices locally or on hosted GPUs to create synthetic data, fine-tune and evaluate models, run inference, or apply safety controls such as auditing and guardrails.
Platform Architects and Administrators: Deploy and scale NeMo microservices on production Kubernetes clusters, either together or as individual components.

High-level Data Flywheel Architecture Diagram with NeMo Microservices#

A data flywheel represents the lifecycle of models and data in a machine learning workflow. The process cycles through data ingestion, model training, evaluation, and deployment.

The following diagram illustrates how the NeMo microservices can construct a complete data flywheel.

Architecture diagram of NeMo microservices deployment forming a complete data flywheel.

Concepts#

Explore the foundational concepts and terminology used across the NeMo microservices platform.

Platform

Start here to learn about the concepts that make up the NeMo microservices platform.

Platform Concepts

Studio

Learn about the web-based interface for managing AI workflows without code.

Studio Concepts

Entities

Learn about the core entities you can use in your AI workflows.

Entity Concepts

Customization

Learn about the fine-tuning concepts you’ll need to be familiar with to customize base models.

Customization Concepts

Evaluation

Learn about the concepts you’ll need to be familiar with to evaluate your AI workflows.

Evaluation Concepts

Inference

Learn about the concepts you’ll need to be familiar with to use the Inference service for testing and serving your custom models.

Inference Concepts

Guardrails

Learn about the concepts you’ll need to be familiar with to control the interaction with your AI workflows.

Guardrail Concepts