Deploy NeMo Evaluator with Docker#
Run the microservice on your local machine using Docker Compose for experimentation.
Note
v2 API Availability: Use the v2 API for jobs to run a larger set of evaluation flows. For details, refer to the V2 API Migration Guide.
V1 API is limited to only support custom evaluation jobs.
Prerequisites#
Before following this deployment guide, ensure that you have:
Docker and Docker Compose installed on your system.
An NGC API Key for accessing NGC Catalog. Create an NGC API key following the instructions at Generating NGC API Keys. You need the NGC API Key to fetch Docker images required by the NeMo microservices platform.
Models must be deployed externally and accessible to the Evaluator or require a GPU and deploy NIM along with Docker Compose.
Set Up#
Export the NGC API Key into your shell environment using the following command:
export NGC_CLI_API_KEY=<your-ngc-api-key>
Setup the NGC CLI following the instructions at Getting Started with the NGC CLI. Make sure you run
ngc config setto setup required ngc configuration.Log in to NVIDIA NGC container registry:
docker login nvcr.io -u '$oauthtoken' -p $NGC_CLI_API_KEY
Download the Docker Compose configuration from NGC:
ngc registry resource download-version "nvidia/nemo-microservices/nemo-microservices-quickstart:25.11" cd nemo-microservices-quickstart_v25.11
Start NeMo Evaluator:
export NEMO_MICROSERVICES_IMAGE_REGISTRY=nvcr.io/nvidia/nemo-microservices export NEMO_MICROSERVICES_IMAGE_TAG=25.11 docker compose --profile evaluator up --detach --quiet-pull --wait
The command starts the Evaluator microservice on port
8080.
(Optional) Deploy NIM:#
Deploy model with Docker Compose.
If there is not a model deployed externally to the Docker Compose setup, you can deploy a NIM with the following Docker Compose file. Deploying a NIM requires 1 GPU of at least 40GB and 8GB of disk space.
Save the following YAML to nim.yaml file to include in your Docker Compose up command.
services:
evaluator:
environment:
# Remap local NIM URL to internal docker network for jobs
- NIM_PROXY_URL=http://localhost:${NIM_PORT:-8000}
- NIM_PROXY_URL_INTERNAL=http://nim:${NIM_PORT:-8000}
nim:
image: ${NIM_IMAGE:-""}
profiles: [evaluator]
container_name: nim
restart: on-failure
ports:
- ${NIM_PORT:-8000}:8000
environment:
- NGC_API_KEY=${NGC_API_KEY:-${NGC_CLI_API_KEY:-""}}
- NIM_GUIDED_DECODING_BACKEND=${NIM_GUIDED_DECODING_BACKEND:-""}
runtime: nvidia
volumes:
- ${MODEL_CACHE}:/opt/nim/.cache
networks:
- nmp
shm_size: 16GB
user: root
deploy:
resources:
reservations:
devices:
- driver: nvidia
capabilities: [gpu]
count: all
healthcheck:
test: [
"CMD",
"python3",
"-c",
"import requests, sys; sys.exit(0 if requests.get('http://localhost:8000/v1/health/live').ok else 1)"
]
interval: 10s
timeout: 3s
retries: 20
start_period: 60s # allow for 60 seconds to download a model and start up
The example will setup a local volume mount and deploy meta/llama-3.2-3b-instruct.
export NIM_PORT=8000 # configure host port for your deployment
export MODEL_CACHE=$(pwd)/model-cache # specify the model cache location
mkdir -p $MODEL_CACHE && chmod 1777 $MODEL_CACHE
export NIM_IMAGE=nvcr.io/nim/meta/llama-3.2-3b-instruct:1.8.5
export NIM_GUIDED_DECODING_BACKEND=outlines # decoding backend depends on the model
docker compose \
-f docker-compose.yaml \
-f nim.yaml \
--profile evaluator up \
--detach --quiet-pull --wait
Service Endpoints#
After starting the services with Docker Compose, the following endpoints will be available by default:
Evaluator API: http://localhost:8080
This is the main endpoint for interacting with the Evaluator microservice.
Nemo Data Store HuggingFace Endpoint: http://localhost:3000/v1/hf
The Data Store exposes a HuggingFace-compatible API at this endpoint.
You can set the
HF_ENDPOINTenvironment variable to this URL if needed for integration or testing.
(Optional) NIM URL:
Use the following model URL if NIM is deployed with Docker Compose
http://nim:8000 is the model URL to use for Evaluation jobs.
http://localhost:${NIM_PORT} is the model URL to use directly from the host.
The model ID must match the deployed NIM. Use
meta/llama-3.2-3b-instructas the model ID for the deployment example for Meta LLama-3.2-3b Instruct.
Verify the Deployment#
After starting the services, verify everything is working:
Check service status:
docker compose ps evaluator
Example response for healthy container
NAME IMAGE COMMAND SERVICE CREATED STATUS PORTS nemo-microservices-evaluator-1 nvcr.io/nvidia/nemo-microservices/evaluator:25.11 "python -m evaluator…" evaluator 5 minutes ago Up 4 minutes (healthy) 0.0.0.0:32774->7331/tcp, [::]:32774->7331/tcp
Verify the service is running:
curl http://localhost:8080/v2/evaluation/jobsExample response for successful deployment
{"object":"list","data":[],"pagination":{}}
(Optional) Verify the deployed NIM:
curl http://localhost:${NIM_PORT}/v1/models
Example response for deployed model
{ "object": "list", "data": [{ "id": "meta/llama-3.2-3b-instruct", "object": "model", "created": 1760457209, "owned_by": "system", "root": "meta/llama-3.2-3b-instruct", "parent": null, "max_model_len": 131072, "permission": [] }] }
Stop Evaluator#
To stop Evaluator and its related services, run the following command:
docker compose --profile evaluator down
This command stops and removes the containers started for the Evaluator and its dependencies. You can restart them at any time using the up command.
Include the flag -v or --volumes to remove persistent volumes.
docker compose --profile evaluator down -v
Upgrade Evaluator#
To upgrade Evaluator to a new version, run the following commands to download the new Docker Compose file and rollout the updated containers.
export VERSION=<specify-new-version>
export NGC_CLI_API_KEY=<your-ngc-api-key>
ngc registry resource download-version "nvidia/nemo-microservices/nemo-microservices-quickstart:${VERSION}"
cd nemo-microservices-quickstart_v${VERSION}
export NEMO_MICROSERVICES_IMAGE_REGISTRY=nvcr.io/nvidia/nemo-microservices
export NEMO_MICROSERVICES_IMAGE_TAG=${VERSION}
docker compose --profile evaluator up --detach --quiet-pull --wait
Next Steps#
For more tutorials, refer to Evaluation Tutorials.
For instructions on how to deploy the microservice on your Kubernetes cluster for production at scale, refer to Deploy NeMo Evaluator Using Helm Chart.