Key Features#
The NVIDIA NeMo microservices platform delivers a comprehensive suite of features that help you build, evaluate, and serve custom Large Language Models (LLMs).
Data Management Features#
Manage your data assets with NeMo data management features:
Streamline your data operations with centralized entity management.
Add custom metadata to enhance organization and simplify retrieval.
Use Hugging Face Hub (
HfApi
) conventions to manage asset files.
Iterate on your AI models by managing your projects, datasets, and models.
Deploy NeMo Data Store.
Deploy NeMo Entity Store.
Customization Features#
Transform base models into specialized solutions for your unique needs:
Create fine-tuned models by fine-tuning a base model on your own data.
Leverage state-of-the-art customization techniques including full supervised fine-tuning and parameter-efficent fine-tuning.
Implement model customization with a single API call.
Work with leading model families including Llama and Phi.
Deploy anywhere -—on-premises or in the cloud—- with Kubernetes support.
Fine-tune your models using customization jobs to improve their performance.
Deploy NeMo Customizer to your Kubernetes cluster.
Evaluation Features#
Ensure your models meet quality and performance standards:
Control LLM and AI pipeline evaluations for both custom and standard benchmarks.
Scale evaluations with a single API call while maintaining full data control.
Maintain consistency across teams through versioned benchmark configurations.
Future-proof your applications with continuous benchmark additions.
Access enterprise-grade support with regularly updated security patches.
Set targets, define evaluation configurations, and run an evaluation job to measure your model’s performance.
Deploy NeMo Evaluator to your Kubernetes cluster.
Inference Features#
Deploy and manage your models as NIM for LLMs for inference.
Model Deployment. You can deploy models as NIMs using the NeMo Deployment Management microservice by specifying deployment configurations and submitting deployment requests.
Model Discovery. The NIM Proxy microservice auto-detects deployed models and lists them through a unified endpoint.
Inference Requests. You can send inference requests to the NIM Proxy endpoint, which routes the requests to the appropriate deployed model.
Model Management. You can manage the deployed models and lifecycle through the NeMo Deployment Management microservice, ensuring models are up-to-date.
Model Access Management. You can manage access to the deployed models through the NeMo Deployment Management microservice.
Deploy NIM for LLMs to your Kubernetes cluster.
Install NeMo Deployment Management to your Kubernetes cluster.
Install NIM Proxy to your Kubernetes cluster.
Guardrail Features#
Protect your AI applications with comprehensive safety features:
Guard against hallucinations, harmful content, and security vulnerabilities.
Implement customizable checks for specific business, language, or geographical requirements.
Optimize performance with Parallel Rails technology.
Integrate seamlessly with third-party APIs including OpenAI, ActiveFence, and TruEra (Snowflake).
Connect with popular Gen AI development tools like LangChain and LlamaIndex.
Add checks to moderate user input and model responses.
Deploy NeMo Guardrails as a standalone service.
Flexible Deployment on Kubernetes#
You can deploy the NeMo microservices as an integrated platform to use the entire platform to create an end-to-end data flywheel, or select specific microservices that complement your existing workflows. The following guides are for cluster administrators who want to deploy the NeMo microservices.
Use the admin setup guide to learn about deploying the NeMo microservices to Kubernetes.
Learn about the different deployment scenarios for setting up the NeMo microservices on Kubernetes.
Deploy all NeMo microservices together as a platform using a single Helm chart.
Deploy any one or more of the NeMo microservices individually.