Key Features#

The NVIDIA NeMo microservices platform delivers a comprehensive suite of features that help you build, evaluate, and serve custom large language models (LLMs) and embedding models. You can also add safety checks to LLMs.

Data Management Features#

Manage your data assets with NeMo data management features:

Streamline your data operations with centralized entity management.
Add custom metadata to enhance organization and simplify retrieval.
Use Hugging Face Hub (HfApi) conventions to manage asset files.

Manage Entities

Iterate on your AI models by managing your projects, datasets, and models.

NeMo Data Store NeMo Entity Store data-scientist user-guide

About Managing Entities

Deploy NeMo Data Store

Deploy NeMo Data Store.

NeMo Data Store cluster-admin deployment-guide

NeMo Data Store Microservice Deployment and Setup Guide

Deploy NeMo Entity Store

Deploy NeMo Entity Store.

NeMo Entity Store cluster-admin deployment-guide

NeMo Entity Store Values Setup

Customization Features#

Transform base models into specialized solutions for your unique needs:

Create fine-tuned models by fine-tuning a base model on your own data.
Leverage state-of-the-art customization techniques including full supervised fine-tuning and parameter-efficient fine-tuning.
Implement model customization with a single API call.
Work with leading model families including Llama and Phi.
Deploy anywhere -—on-premises or in the cloud—- with Kubernetes support.

Fine-Tune

Fine-tune your models using customization jobs to improve their performance.

NeMo Customizer data-scientist user-guide

About Fine-Tuning

Deploy NeMo Customizer

Deploy NeMo Customizer to your Kubernetes cluster.

NeMo Customizer cluster-admin deployment-guide

NeMo Customizer Microservice Deployment Guide

Evaluation Features#

Ensure your models meet quality and performance standards:

Control LLM and AI pipeline evaluations for both custom and standard benchmarks.
Scale evaluations with a single API call while maintaining full data control.
Maintain consistency across teams through versioned benchmark configurations.
Future-proof your applications with continuous benchmark additions.
Access enterprise-grade support with regularly updated security patches.

Evaluate

Set targets, define evaluation configurations, and run an evaluation job to measure your model’s performance.

NeMo Evaluator data-scientist user-guide

About Evaluating

Deploy NeMo Evaluator

Deploy NeMo Evaluator to your Kubernetes cluster.

NeMo Evaluator cluster-admin deployment-guide

Deploy the NeMo Evaluator Microservice

Inference Features#

Deploy and manage your models as NIM for inference.

Model Deployment. You can deploy models as NIMs using the NeMo Deployment Management microservice by specifying deployment configurations and submitting deployment requests.
Model Discovery. The NIM Proxy microservice auto-detects deployed models and lists them through a unified endpoint.
Inference Requests. You can send inference requests to the NIM Proxy endpoint, which routes the requests to the appropriate deployed model.
Model Management. You can manage the deployed models and lifecycle through the NeMo Deployment Management microservice, ensuring models are up-to-date.
Model Access Management. You can manage access to the deployed models through the NeMo Deployment Management microservice.

Deploy and Proxy NIM for Inference

Deploy NIM to your Kubernetes cluster.

NIM data-scientist user-guide

About Deploying and Running Inference on NIM

Install NeMo Deployment Management

Install NeMo Deployment Management to your Kubernetes cluster.

NeMo Deployment Management cluster-admin deployment-guide

NeMo Deployment Management Setup Guide

Install NIM Proxy

Install NIM Proxy to your Kubernetes cluster.

NIM Proxy cluster-admin deployment-guide

NeMo NIM Proxy Helm Chart Values Setup

Guardrail Features#

Protect your AI applications with comprehensive safety features:

Guard against hallucinations, harmful content, and security vulnerabilities.
Implement customizable checks for specific business, language, or geographical requirements.
Optimize performance with Parallel Rails technology.
Integrate seamlessly with third-party APIs including OpenAI, ActiveFence, and TruEra (Snowflake).
Connect with popular Gen AI development tools like LangChain and LlamaIndex.

Add Guardrails

Add checks to moderate user input and model responses.

NeMo Guardrails data-scientist user-guide

About Guardrails

Deploy NeMo Guardrails

Deploy NeMo Guardrails as a standalone service.

NeMo Guardrails cluster-admin deployment-guide

NeMo Guardrails Microservice Deployment Guide

Data Designer Features (Early Access)#

Generate high-quality synthetic datasets using AI models, statistical sampling, and configurable data schemas.

Synthesize Data

Generate high-quality synthetic datasets using AI models, statistical sampling, and configurable data schemas.

NeMo Data Designer data-scientist user-guide

About Generating Synthetic Data

Deploy NeMo Data Designer

Deploy NeMo Data Designer as a standalone service.

NeMo Data Designer cluster-admin deployment-guide

NeMo Data Designer Microservice Deployment Guide

Auditor Features (Early Access)#

Ensure your models and agentic applications meet your safety standards:

Audit models and systems for security vulnerabilities such as jailbreaks and harmful content.
Run a broad series of probes or select more focused risk areas.
Review a basic HTML report or develop your own with the data from audit jobs.

Audit Model Safety

Audit LLMs for security vulnerabilities and assess risk.

NeMo Auditor data-scientist user-guide

About Auditing Models

Deploy NeMo Auditor

Deploy NeMo Auditor as a standalone service.

NeMo Auditor cluster-admin deployment-guide

NeMo Auditor Microservice Deployment Guide

Cross-service Compatibility#

NVIDIA NeMo microservices work with the following NVIDIA NIM microservices and external endpoints in OpenAI-compatible format.

Compatible NVIDIA NIM Microservices

You can use the following NVIDIA NIM microservices deployed to your Kubernetes cluster with the NeMo microservices.

NVIDIA NIM for LLMs
NVIDIA NeMo Retriever Text Embeddings NIM
- nvidia/llama-3.2-nv-embedqa-1b-v2

To deploy the NIM microservices, you can either use the NeMo Deployment Management microservice or the corresponding NIM Helm charts.

Compatible External Endpoints

You can set up the following external endpoints to work with the NeMo microservices.

OpenAI API endpoints
NIM API endpoints in build.nvidia.com
- LLM NIM API endpoints
- NVIDIA NeMo Retriever Llama 3.2 Embedding NIM

Flexible Deployment on Kubernetes#

You can deploy the NeMo microservices as an integrated platform to use the entire platform to create an end-to-end data flywheel, or select specific microservices that complement your existing workflows. The following guides are for cluster administrators who want to deploy the NeMo microservices.

About Admin Setup

Use the admin setup guide to learn about deploying the NeMo microservices to Kubernetes.

cluster-admin deployment-guide

About Admin Setup

Deployment Scenarios

Learn about the different deployment scenarios for setting up the NeMo microservices on Kubernetes.

cluster-admin deployment-guide

Helm Installation Options

Deploy as a Platform

Deploy all NeMo microservices together as a platform using a single Helm chart.

cluster-admin deployment-guide

Install NeMo Microservices as a Platform