Deploy NeMo Evaluator with Docker#

Run the microservice on your local machine using Docker Compose for experimentation.

Note

v2 API Availability: Use the v2 API for jobs to run a larger set of evaluation flows. For details, refer to the V2 API Migration Guide.

V1 API is limited to only support custom evaluation jobs.

Prerequisites#

Before following this deployment guide, ensure that you have:

Docker and Docker Compose installed on your system.
An NGC API Key for accessing NGC Catalog. Create an NGC API key following the instructions at Generating NGC API Keys. You need the NGC API Key to fetch Docker images required by the NeMo microservices platform.
Models must be deployed externally and accessible to the Evaluator or require a GPU and deploy NIM along with Docker Compose.

Set Up#

Export the NGC API Key into your shell environment using the following command:
```
export NGC_CLI_API_KEY=<your-ngc-api-key>
```
Setup the NGC CLI following the instructions at Getting Started with the NGC CLI. Make sure you run ngc config set to setup required ngc configuration.

Log in to NVIDIA NGC container registry:

docker login nvcr.io -u '$oauthtoken' -p $NGC_CLI_API_KEY

Download the Docker Compose configuration from NGC:

ngc registry resource download-version "nvidia/nemo-microservices/nemo-microservices-quickstart:25.10"
cd nemo-microservices-quickstart_v25.10

Start NeMo Evaluator:

export NEMO_MICROSERVICES_IMAGE_REGISTRY=nvcr.io/nvidia/nemo-microservices
export NEMO_MICROSERVICES_IMAGE_TAG=25.10
docker compose --profile evaluator up --detach --quiet-pull --wait

The command starts the Evaluator microservice on port 8080.

(Optional) Deploy NIM:

If there is not a model deployed externally to the Docker Compose setup, you can deploy a NIM with the following Docker Compose file. Deploying a NIM requires 1 GPU of at least 40GB.

Save the following YAML to nim.yaml file to include in your Docker Compose up command.

services:
   nim:
      image: ${NIM_IMAGE:-""}
      profiles: [evaluator]
      container_name: nim
      restart: on-failure
      ports:
         - ${NIM_PORT:-8000}:8000
      environment:
         - NGC_API_KEY=${NGC_API_KEY:-${NGC_CLI_API_KEY:-""}}
         - NIM_GUIDED_DECODING_BACKEND=${NIM_GUIDED_DECODING_BACKEND:-""}
      runtime: nvidia
      volumes:
         - ${MODEL_CACHE}:/opt/nim/.cache
      networks:
         - nmp
      shm_size: 16GB
      user: root
      deploy:
         resources:
            reservations:
               devices:
                  - driver: nvidia
                    capabilities: [gpu]
                    count: all
      healthcheck:
         test: [
            "CMD",
            "python3",
            "-c",
            "import requests, sys; sys.exit(0 if requests.get('http://localhost:8000/v1/health/live').ok else 1)"
         ]
         interval: 10s
         timeout: 3s
         retries: 20
         start_period: 60s # allow for 60 seconds to download a model and start up

The example will setup a local volume mount and deploy meta/llama-3.2-3b-instruct.

export NIM_PORT=8000 # configure host port for your deployment

export MODEL_CACHE=$(pwd)/model-cache # specify the model cache location
mkdir -p $MODEL_CACHE && chmod 1777 $MODEL_CACHE

export NIM_IMAGE=nvcr.io/nim/meta/llama-3.2-3b-instruct:1.8.5
export NIM_GUIDED_DECODING_BACKEND=fast_outlines # decoding backend depends on the model

docker compose \
   -f docker-compose.yaml \
   -f nim.yaml \
   --profile evaluator up \
   --detach --quiet-pull --wait

Service Endpoints#

After starting the services with Docker Compose, the following endpoints will be available by default:

Evaluator API: http://localhost:8080
- This is the main endpoint for interacting with the Evaluator microservice.
Nemo Data Store HuggingFace Endpoint: http://localhost:3000/v1/hf
- The Data Store exposes a HuggingFace-compatible API at this endpoint.
- You can set the HF_ENDPOINT environment variable to this URL if needed for integration or testing.
NIM URL:
- http://nim:8000 is the model URL to use for Evaluation jobs.
  - The model ID must match the deployed NIM. Use meta/llama-3.2-3b-instruct as the model ID for the deployment example for Meta LLama-3.2-3b Instruct.
- http://localhost:${NIM_PORT} is the model URL to use directly from the host.

Verify the Deployment#

After starting the services, verify everything is working:

Check service status:
```
docker compose ps
```

Verify the service is running:

curl -fv http://localhost:8080/v2/evaluation/jobs

(Optional) Verify the deployed NIM:

curl http://localhost:${NIM_PORT}/v1/models

Stop Evaluator#

To stop Evaluator and its related services, run the following command:

docker compose --profile evaluator down

This command stops and removes the containers started for the Evaluator and its dependencies. You can restart them at any time using the up command.

Include the flag -v or --volumes to remove persistent volumes.

docker compose --profile evaluator down -v

Next Steps#

For more tutorials, refer to Evaluation Tutorials.
For instructions on how to deploy the microservice on your Kubernetes cluster for production at scale, refer to Deploy NeMo Evaluator Using Helm Chart.