Set Up Manually#

If you want to manually set up the minikube cluster and the NeMo microservices platform, follow the steps in the following sections.

Note

This minikube cluster setup tutorial is designed for the Beginner Platform Tutorials that run small workloads of fine-tuning, evaluating, and running inference on smaller LLMs such as llama-3.1-8b-instruct and meta-llama/llama-3.2-1b-instruct. If you want to run AI workloads at a larger scale, set up the NeMo microservices platform on a larger Kubernetes cluster. For more information, refer to About Admin Setup.


Before You Begin#

Check the Developer Setup Requirements before you begin.


Start Minikube#

  1. Download minikube following the minikube start guide in the minikube documentation.

  2. Refer to the instructions for Using NVIDIA GPUs with minikube and follow the steps until you reach the command to start minikube. When you get to that point, use the following command to ensure that there is enough RAM and CPU.

    minikube start \
       --driver docker \
       --container-runtime docker \
       --cpus no-limit \
       --memory no-limit \
       --gpus all
    
  3. Enable the minikube ingress addon.

    minikube addons enable ingress
    
  4. Label the node to indicate GPU availability. The NVIDIA GPU Operator typically adds this label, but it is not used in this demo minikube deployment.

    kubectl label node minikube feature.node.kubernetes.io/pci-10de.present=true --overwrite
    

Install the NeMo Microservices Platform#

Use the NVIDIA NeMo Microservices Helm Chart to install the NeMo microservices platform.

  1. Sign in to your NGC account on the NGC Sign In page. If you do not have one, create one.

  2. Create an NGC API key to access the NGC Catalog and model endpoints on build.nvidia.com. Follow the instructions at Generating NGC API Keys.

    Note

    When generating the key, select the NGC Catalog and Public API Endpoints services to allow access to the NGC Catalog and model endpoints on build.nvidia.com.

  3. Store the NGC API Key into the following variables:

    export NGC_API_KEY=<your-ngc-api-key>
    export NVIDIA_API_KEY=<your-ngc-api-key>
    

    These environment variables, NGC_API_KEY and NVIDIA_API_KEY, will be used in later steps to create the required Kubernetes secrets and to install the NeMo microservices platform.

    Note

    Set both variables to the same API key. You can use the same NGC API key for authentication in multiple contexts such as pulling images from NGC Catalog and accessing endpoints on build.nvidia.com.

  4. Set up the following secrets in the default namespace of the cluster.

    kubectl create secret \
       docker-registry nvcrimagepullsecret \
       --docker-server=nvcr.io \
       --docker-username='$oauthtoken' \
       --docker-password=$NGC_API_KEY
    
    kubectl create secret generic ngc-api \
       --from-literal=NGC_API_KEY=$NGC_API_KEY
    
    kubectl create secret generic nvidia-api \
       --from-literal=NVIDIA_API_KEY=$NVIDIA_API_KEY
    
    kubectl create secret generic hf-token \
     --from-literal=HF_TOKEN=<your-hugging-face-token>
    
  5. Add the Helm repository for NeMo Microservices Helm Chart.

    helm repo add nmp https://helm.ngc.nvidia.com/nvidia/nemo-microservices \
       --username='$oauthtoken' \
       --password=$NGC_API_KEY
    
    helm repo update
    
  6. Install Volcano scheduler before installing the chart:

    kubectl apply -f https://raw.githubusercontent.com/volcano-sh/volcano/v1.9.0/installer/volcano-development.yaml
    

    Note

    If you are installing volcano on a cluster that has had it previously installed, then you may need to manually delete the webhooks as follows. We recommend starting with a fresh cluster.

    kubectl delete mutatingwebhookconfiguration volcano-admission-service-pods-mutate
    kubectl delete validatingwebhookconfiguration volcano-admission-service-pods-validate
    
  7. Install the chart:

    helm --namespace default install \
       nemo nmp/nemo-microservices-helm-chart \
       --set guardrails.guardrails.nvcfAPIKeySecretName="nvidia-api"
    

    The pods require approximately 30 minutes to download images, start the containers, and establish stable communication. During this time, it is normal for pods to be in a pending or restarting state.

    Installing Early Access Microservices

    To install the microservices in early access mode, enable them by adding the respective tags as follows:

    helm --namespace default install \
       nemo nmp/nemo-microservices-helm-chart \
       --set guardrails.guardrails.nvcfAPIKeySecretName="nvidia-api" \
       --set tags.auditor=true \
       --set tags.safe-synthesizer=true \
       --set tags.studio=true
    
  8. Verify that the pods are in the ready state:

    kubectl get pods
    

    Confirm that pods with a running status display 1/1 or 2/2. If pods do not enter the Running or Completed state after 30 minutes, run kubectl events to check for errors.

    If the pods fail to stabilize, you can investigate image or Kubernetes issues by running kubectl events. If you want to diagnose issues on a specific deployment, add --for deployment/<deployment name> to target the deployment. You can also investigate software issues using kubectl logs <name of concerning pod>.


Configure DNS Resolution#

  1. Display the ingress resources:

    kubectl get ingress
    

    The following is an example output.

    NAME                            CLASS    HOSTS                      ADDRESS   PORTS   AGE
    nemo-microservices-helm-chart   <none>   nim.test,data-store.test             80      34m
    
  2. Export an environment variable with the accessible IP address of your ingress controller:

    export NEMO_HOST=$(minikube ip)
    
  3. Add host name entries in the /etc/hosts file for the *.test ingress hosts to use the accessible IP address. Make a backup of the /etc/hosts file before you make the changes.

    sudo cp /etc/hosts /etc/hosts.bak
    echo -e "$NEMO_HOST nemo.test\n$NEMO_HOST nim.test\n$NEMO_HOST data-store.test\n" | sudo tee -a /etc/hosts
    

    To learn more about how the hosts and their default path rules are configured, refer to Ingress Setup for Production Environment.

Tip

If you complete the steps in this section, the minikube cluster is ready with the NeMo microservices platform installed. Proceed to the Beginner Platform Tutorials to learn how to use the capabilities of the NeMo microservices.

Service Endpoints#

After deploying the services to minikube and configuring DNS, the following service endpoints are available:

  • Base URL: http://nemo.test

    • This is the main endpoint for interacting with the NeMo microservices platform.

  • Nemo Data Store HuggingFace Endpoint: http://data-store.test/v1/hf

    • The Data Store microservice exposes a HuggingFace-compatible API at this endpoint.

    • Set the HF_ENDPOINT environment variable to this URL.

      export HF_ENDPOINT=http://data-store.test/v1/hf
      
  • Inference URL: http://nim.test

    • This is the endpoint for the NIM Proxy microservice deployed as a part of the platform.


Clean Up#

After you’re done with the Beginner Platform Tutorials, delete the minikube cluster to clean up.

Warning

This deletes the minikube cluster and all the NeMo platform setup and the resources associated within it. Do not run this command unless you are done with the tutorial and want to delete the minikube cluster.

minikube delete

Recover the /etc/hosts file from the backup to remove the host name entries for the *.test ingress hosts.

sudo cp /etc/hosts.bak /etc/hosts

Deploy to a Production-Grade Kubernetes Cluster#

If you have completed this minikube tutorial and want to deploy the NeMo microservices platform to a production-grade Kubernetes cluster, proceed to the Admin Setup section.