Key Features#
The NVIDIA NeMo microservices platform delivers a comprehensive suite of features that help you build, evaluate, and serve custom large language models (LLMs) and embedding models. You can also add safety checks to LLMs.
Data Management Features#
Manage your data assets with NeMo data management features:
- Streamline your data operations with centralized entity management. 
- Add custom metadata to enhance organization and simplify retrieval. 
- Use Hugging Face Hub ( - HfApi) conventions to manage asset files.
Iterate on your AI models by managing your projects, datasets, and models.
Customization Features#
Transform base models into specialized solutions for your unique needs:
- Create fine-tuned models by fine-tuning a base model on your own data. 
- Leverage state-of-the-art customization techniques including full supervised fine-tuning and parameter-efficient fine-tuning. 
- Implement model customization with a single API call. 
- Work with leading model families including Llama and Phi. 
- Deploy anywhere -—on-premises or in the cloud—- with Kubernetes support. 
Fine-tune your models using customization jobs to improve their performance.
Deploy NeMo Customizer to your Kubernetes cluster.
Evaluation Features#
Ensure your models meet quality and performance standards:
- Control LLM and AI pipeline evaluations for both custom and standard benchmarks. 
- Scale evaluations with a single API call while maintaining full data control. 
- Maintain consistency across teams through versioned benchmark configurations. 
- Future-proof your applications with continuous benchmark additions. 
- Access enterprise-grade support with regularly updated security patches. 
Set targets, define evaluation configurations, and run an evaluation job to measure your model’s performance.
Deploy NeMo Evaluator to your Kubernetes cluster.
Inference Features#
Deploy and manage your models as NIM for inference.
- Model Deployment. You can deploy models as NIMs using the NeMo Deployment Management microservice by specifying deployment configurations and submitting deployment requests. 
- Model Discovery. The NIM Proxy microservice auto-detects deployed models and lists them through a unified endpoint. 
- Inference Requests. You can send inference requests to the NIM Proxy endpoint, which routes the requests to the appropriate deployed model. 
- Model Management. You can manage the deployed models and lifecycle through the NeMo Deployment Management microservice, ensuring models are up-to-date. 
- Model Access Management. You can manage access to the deployed models through the NeMo Deployment Management microservice. 
Deploy NIM to your Kubernetes cluster.
Guardrail Features#
Protect your AI applications with comprehensive safety features:
- Guard against hallucinations, harmful content, and security vulnerabilities. 
- Implement customizable checks for specific business, language, or geographical requirements. 
- Optimize performance with Parallel Rails technology. 
- Integrate seamlessly with third-party APIs including OpenAI, ActiveFence, and TruEra (Snowflake). 
- Connect with popular Gen AI development tools like LangChain and LlamaIndex. 
Add checks to moderate user input and model responses.
Deploy NeMo Guardrails as a standalone service.
Data Designer Features (Early Access)#
Generate high-quality synthetic datasets using AI models, statistical sampling, and configurable data schemas.
Generate high-quality synthetic datasets using AI models, statistical sampling, and configurable data schemas.
Deploy NeMo Data Designer as a standalone service.
Auditor Features (Early Access)#
Ensure your models and agentic applications meet your safety standards:
- Audit models and systems for security vulnerabilities such as jailbreaks and harmful content. 
- Run a broad series of probes or select more focused risk areas. 
- Review a basic HTML report or develop your own with the data from audit jobs. 
Audit LLMs for security vulnerabilities and assess risk.
Deploy NeMo Auditor as a standalone service.
Cross-service Compatibility#
NVIDIA NeMo microservices work with the following NVIDIA NIM microservices and external endpoints in OpenAI-compatible format.
Compatible NVIDIA NIM Microservices
You can use the following NVIDIA NIM microservices deployed to your Kubernetes cluster with the NeMo microservices.
- NVIDIA NIM for LLMs 
- NVIDIA NeMo Retriever Text Embeddings NIM 
To deploy the NIM microservices, you can either use the NeMo Deployment Management microservice or the corresponding NIM Helm charts.
Compatible External Endpoints
You can set up the following external endpoints to work with the NeMo microservices.
- OpenAI API endpoints 
- NIM API endpoints in build.nvidia.com 
Flexible Deployment on Kubernetes#
You can deploy the NeMo microservices as an integrated platform to use the entire platform to create an end-to-end data flywheel, or select specific microservices that complement your existing workflows. The following guides are for cluster administrators who want to deploy the NeMo microservices.
Use the admin setup guide to learn about deploying the NeMo microservices to Kubernetes.
Learn about the different deployment scenarios for setting up the NeMo microservices on Kubernetes.
Deploy all NeMo microservices together as a platform using a single Helm chart.
Deploy any one or more of the NeMo microservices individually.