NeMo Microservices Prerequisites#

NeMo microservices have some shared prerequisites to configure on your Kubernetes cluster before deploying a NeMo microservice with the NVIDIA NIM Operator. The prerequisites include:

Creating a namespace to deploy your NeMo microservices.
Creating image pull secrets that contain your NGC API key.
Optionally, deploy NeMo dependencies with Ansible. Each NeMo microservice has per-component dependencies in addition to the prerequisites on this page. The NIM Operator maintains Ansible playbooks to help you quickly deploy these microservice-specific dependencies to test the NeMo microservices.
Optionally, if you are installing NeMo Customizer, install NeMo Operator which provides custom resources that help manage NeMo Customizer jobs.

After configuring the prerequisites on this page, you can continue to the NeMo microservices deployment guide or deploy each microservice individually by reviewing the microservice-specific page.

1. Create NeMo Namespace#

It’s recommended to deploy all NeMo microservices in a single namespace:

$ kubectl create namespace nemo

You should also consider applying a Kubernetes resource quota to your NeMo namespace. If you deploy using the Ansible NeMo Dependency playbook, a default nemo namespace will be created by the playbook.

2. Create NGC API Key Image Pull Secrets#

You must create the required image pull secrets in your nemo namespace to be able to pull NeMo microservice images from NVIDIA NGC. Refer to Generating Your NGC API Key in the NVIDIA NGC User Guide for more information.

Add a Docker registry secret for downloading container images from NVIDIA NGC:

$ kubectl create secret -n nemo docker-registry ngc-secret \
    --docker-server=nvcr.io \
    --docker-username='$oauthtoken' \
    --docker-password=<ngc-api-key>

Add a generic secret that the model puller containers use to download models from NVIDIA NGC:

$ kubectl create secret -n nemo generic ngc-api-secret \
    --from-literal=NGC_API_KEY=<ngc-api-key>

3. Deploy NeMo Dependencies with Ansible#

Each NeMo microservice relies on several dependencies, for things like databases or Kubernetes secrets. The NIM Operator team maintains Ansible playbooks to help you quickly install most dependencies on your cluster.

Note

While these playbooks are helpful for testing, they may not be suitable for production environments. Refer to each microservice’s configuration page for specific dependency requirements.

Using this playbook you can choose to install dependencies for one or more of the following NeMo microservices:

NeMo Microservice	Dependencies Deployed with Ansible	Additional Dependencies You Create
NeMo Data Store	PostgreSQL, MinIO, Kubernetes storage provisioner*, Kubernetes secrets (database user, object storage user, Data Store default)	Image pull secret
NeMo Entity Store	PostgreSQL, Kubernetes storage provisioner*, Kubernetes secrets (database user)	Image pull secret, NeMo Data Store, NeMo Entity Store
NeMo Evaluator	PostgreSQL, Kubernetes secrets (database user), Argo Workflows, Milvus	Image pull secret, NeMo Data Store, NeMo Entity Store
NeMo Customizer	PostgreSQL, Kubernetes storage provisioner*, Kubernetes secrets (database user, W&B API Key), Kubernetes ConfigMap for training and model downloads, OpenTelemetry, Volcano Scheduler	Image pull secret, NeMo Data Store, NeMo Entity Store, NeMo Operator
NeMo Guardrails	Kubernetes storage provisioner*, NIM Endpoint as a NIMPipeline	Image pull secret

Note

*By default the NeMo Dependency playbook will install Local Path Provisioner to use as the default StorageClass. You can configure a different by specifying your desired pvc, described in more detail below.

Install Dependencies#

Clone the NIM Operator repository:

$ git clone https://github.com/NVIDIA/k8s-nim-operator.git

Navigate to the nemo-dependencies directory:

$ cd k8s-nim-operator/test/e2e/nemo-dependencies

Configure the playbook to install all dependencies by updating each microservice to yes in the values.yaml:

install:
  customizer: yes
  datastore: yes
  entity_store: yes
  evaluator: yes
  jupyter: yes #Deploys jupyter server to use with jupyter notebook tutorial. Change to `no` if you don't want this deployed.

uninstall:
  customizer: yes
  datastore: yes
  entity_store: yes
  evaluator: yes
  jupyter: yes 

installation_namespace: nemo

# Specify a custom storage class and volume access mode for all PVCs, for e.g. ReadWriteMany for nfs storage class
pvc:
  # Ignored when localPathProvisioner.enabled is true — in which case these will default to:
  # storage_class: "local-path" and volume_access_mode: "ReadWriteOnce"
  storage_class: ""
  volume_access_mode: ReadWriteOnce

# Deploy a local-path CSI provisioner
localPathProvisioner:
  # disable it when a different CSI provisioner is already deployed in the cluster
  enabled: true
  default: true
  version: v0.0.31

Additional configuration options:

Change the namespace to deploy all NeMo dependencies.
By default, the playbook creates and deploys services into the nemo namespace. You can change this by updating the installation_namespace.
By default the playbook will deploy Local Path Provisioner to use as the default StorageClass. If a default StorageClass is already provisioned in the cluster, set localPathProvisioner.enabled:false then specify a custom StorageClass and volume access mode the playbook should use for all PVCs, for example ReadWriteMany for NFS storage class.

Refer to the NeMo dependencies documentation for full details on available configuration options.

Run the Ansible playbook:
```
$ ansible-playbook -c local -i localhost install.yaml
```
The playbook will take several minutes to complete.

Verify Dependencies#

Check that pods are running:

$ kubectl get pods -n nemo

Example output

NAME                                                      READY   STATUS      RESTARTS   AGE
argo-workflows-server-85d8489c58-l5fnc                    1/1     Running     0          10m
argo-workflows-workflow-controller-698f7bb767-dhr9l       1/1     Running     0          10m
customizer-otel-opentelemetry-collector-7ff98567c5-slg8v  1/1     Running     0          10m
customizer-pg-postgresql-0                                1/1     Running     0          10m
datastore-pg-postgresql-0                                 1/1     Running     0          10m
entity-store-pg-postgresql-0                              1/1     Running     0          10m
evaluator-otel-opentelemetry-collector-6cf75b448-f2m6h    1/1     Running     0          10m
evaluator-pg-postgresql-0                                 1/1     Running     0          10m
jupyter-notebook-f4cdbc988-cgc8z                          1/1     Running     0          10m
meta-llama3-1b-instruct-5cbd55b49b-nrt9j                  1/1     Running     0          10m
milvus-standalone-8fbb48495-dfrhr                         1/1     Running     0          10m
mlflow-minio-568b6bc597-nx647                             1/1     Running     0          10m
mlflow-minio-provisioning-vfvr9                           0/1     Completed   0          10m
mlflow-postgresql-0                                       1/1     Running     0          10m
mlflow-tracking-6fbc46b567-6q6n8                          1/1     Running     0          10m
volcano-admission-5c5c96b944-pfltc                        1/1     Running     0          10m
volcano-admission-init-stkhm                              0/1     Completed   0          10m
volcano-controllers-699b864756-rnvqb                      1/1     Running     0          10m
volcano-scheduler-5f77fc8fb9-cjqht                        1/1     Running     0          10m

Verify the Kubernetes secrets for the dependencies have been created:
```
$ kubectl get secrets -n nemo
```

4. Install NeMo Operator#

When you want to run a NeMo Customizer workflow, you need to install the NeMo Operator microservice. This microservice manages custom resources around LLM training workload for NeMo Customizer jobs. It does not manage any NeMo microservice. Refer to the NeMo microservice documentation for details about the NeMo Operator and the training CRDs is manages.

Prerequisites#

Create the image pull secrets in the nemo namespace, or the namespace you plan to install the NeMo Operator. You also need to pass your NVIDIA NGC API Key to fetch the NeMo Operator Helm chart.
Access to an NFS-backed Persistent Volume that supports ReadWriteMany access mode. The NeMo Operator microservice dynamically provisions NFS-backed persistent volumes using Kubernetes storage classes.
Install Volcano scheduler on your cluster. Use the NeMo Dependency Ansible playbooks to install Volcano, or refer to the Volcano install documentation.
Its recommend that you use the latest version of the NeMo Operator. View available NeMo Operator versions on NVIDIA NGC and update the VERSION variable to pull your desired version.
```
$ export VERSION=25.06
```

Install the NeMo Operator with Helm#

Fetch the NeMo Operator Helm chart.

$ helm fetch https://helm.ngc.nvidia.com/nvidia/nemo-microservices/charts/nemo-operator-${VERSION}.tgz --username='$oauthtoken' --password=<YOUR NGC API Key>

Install the NeMo Operator.

$ helm upgrade --install nemo-operator nemo-operator-${VERSION}.tgz -n nemo --set imagePullSecrets[0].name=ngc-secret --set controllerManager.manager.scheduler=volcano 

Verify the NeMo Operator was installed.

$ kubectl get pods -n nemo | grep "nemo-operator"

NeMo Microservices Prerequisites#

1. Create NeMo Namespace#

2. Create NGC API Key Image Pull Secrets#

3. Deploy NeMo Dependencies with Ansible#

Install Dependencies#

Verify Dependencies#

4. Install NeMo Operator#

Prerequisites#

Install the NeMo Operator with Helm#

Next Steps#