Quick Start: Deploy All NeMo Microservices#

NVIDIA NeMo microservices are a modular set of tools that you can use to customize, evaluate, and secure large language models (LLMs) while optimizing AI applications across on-premises or cloud-based Kubernetes clusters.

This page covers how to deploy all NeMo microservices to your Kubernetes cluster using the NIM Operator NeMo CRDs. For more details about using the NeMo microservices together, refer to the NeMo microservices documentation.

To deploy all the NeMo microservices, follow these steps:

Configure your cluster with the NeMo prerequisites.
Deploy the NeMo microservices custom resource samples.
Verify the microservices deployments.

Optionally, after completing this Quick Start, you can continue with the NeMo Data Flywheel Tutorial with a Jupyter notebook.

1. Prerequisites#

All the common NeMo microservices prerequisites, including:
- Configuring necessary secrets and namespaces.
- Deploying all the NeMo dependencies with the Ansible playbook.
- Installing the NeMo Operator.

Minimum System Requirements#

A single-node Kubernetes cluster on a Linux host and cluster-admin level permissions.
At least 200 GB of free disk space.
At least one dedicated GPUs (A100 80 GB or H100 80 GB)

If you are planning to also deploy the NeMo Data Flywheel Tutorial with a Jupyter notebook, you must have two available GPUs. Refer to the NVIDIA NeMo Microservices Platform Support section for details on required resources.

2. Deploy the NeMo Microservices#

Clone the NIM Operator repository:

$ git clone https://github.com/NVIDIA/k8s-nim-operator.git

Checkout version tag:
```
$ git checkout v3.0.0
```

Apply all the NeMo microservices sample files in the NIM Operator repository:

$ kubectl apply -n nemo -f k8s-nim-operator/config/samples/nemo/latest/ 

Example output

nemocustomizer.apps.nvidia.com/nemocustomizer-sample created
nemodatastore.apps.nvidia.com/nemodatastore-sample created
nemoentitystore.apps.nvidia.com/nemoentitystore-sample created
nemoevaluator.apps.nvidia.com/nemoevaluator-sample created
nemoguardrail.apps.nvidia.com/nemoguardrails-sample created
nimcache.apps.nvidia.com/meta-llama3-1b-instruct created
nimpipeline.apps.nvidia.com/llama3-1b-pipeline created
configmap/nemo-training-config created
configmap/nemo-model-config created

It will take several minutes for all the microservices to deploy and be ready.

View all the NeMo microservices:

$ kubectl get -n nemo nemoentitystore,nemodatastore,nemoguardrails,nemocustomizer,nemoevaluator

Example output

NAME                                                     STATUS   AGE
nemoentitystore.apps.nvidia.com/nemoentitystore-sample   Ready    23m
nemodatastore.apps.nvidia.com/nemodatastore-sample       Ready    23m
nemoguardrail.apps.nvidia.com/nemoguardrails-sample      Ready    23m  
nemocustomizer.apps.nvidia.com/nemocustomizer-sample     Ready    23m
nemoevaluator.apps.nvidia.com/nemoevaluator-sample       Ready    23m

View all the NeMo Customizer ConfigMaps:

$ kubectl get -n nemo configmap | grep "nemo"

Example output

nemo-model-config                              1      42m
nemo-training-config                           1      42m
nemocustomizer-sample                          1      42m

View all the NIM microservices:

$ kubectl get -n nemo nimpipeline,nimcache,nimservice

Example output

NAME                                                 STATUS   AGE
nimpipeline.apps.nvidia.com/llama3-1b-pipeline       Ready    40m

NAME                                                 STATUS   PVC                           AGE
nimcache.apps.nvidia.com/meta-llama3-1b-instruct     Ready    meta-llama3-1b-instruct-pvc   40m

NAME                                                 STATUS   AGE
nimservice.apps.nvidia.com/meta-llama3-1b-instruct   Ready    40m

3. Verify the NeMo Microservices#

To verify that all microservices are working correctly, you can test their API endpoints using a temporary pod with curl installed.

List the NeMo microservices services to confirm they are running:

$ kubectl get services -n nemo | grep "nemo"

Example output

nemocustomizer-sample    ClusterIP   XX.XXX.XXX.XXX   <none>        8000/TCP,9009/TCP      31m
nemodatastore-sample     ClusterIP   XX.XXX.XXX.XXX   <none>        8000/TCP               31m
nemoentitystore-sample   ClusterIP   XX.XXX.XXX.XXX   <none>        8000/TCP               31m
nemoevaluator-sample     ClusterIP   XX.XXX.XXX.XXX   <none>        8000/TCP               31m
nemoguardrails-sample    ClusterIP   XX.XXX.XXX.XXX   <none>        8000/TCP               31m

Start a temporary pod with curl installed to test the services:
```
$ kubectl run --rm -it -n default curl --image=curlimages/curl:latest -- ash
```
Substitute any pod that has the command and meets your organization’s security requirements.
Test each microservice’s API endpoint:
- NeMo Customizer:
```
curl -X GET "http://nemocustomizer-sample.nemo:8000/v1/customization/configs"
```
Expected output: A JSON response with configuration details
- NeMo Evaluator:
```
curl -X GET "http://nemoevaluator-sample.nemo:8000/v1/evaluation/configs"
```
Expected output: A JSON response with evaluation configurations.
- NeMo Entity Store:
```
curl -X GET "http://nemoentitystore-sample.nemo:8000/v1/base-urls"
```
Expected output: A JSON response with base URLs.
- NeMo Data Store:
```
curl -X GET "http://nemodatastore-sample.nemo:8000/v1/hf/api/datasets"
```
Expected output: A JSON response with dataset information.
- NeMo Guardrails:
```
curl -X GET "http://nemoguardrails-sample.nemo:8000/v1/guardrail/configs"
```
Expected output: A JSON response with guardrail configurations
Press Ctrl+D to exit and delete the temporary pod.

Next Steps#

Continue with the NeMo Data Flywheel Tutorial with a Jupyter notebook
To customize individual NeMo microservice configurations, refer to the individual NeMo microservice page.
For upgrades, NIM Operator configuration sample files for each NeMo microservice release are available in the directory. Apply the sample files for your respective NeMo microservice release for rolling upgrades.