Deploy NeMo Evaluator with Docker#
Run the microservice on your local machine using Docker Compose for experimentation.
Note
v2 API Availability: Use the v2 API for jobs to run a larger set of evaluation flows. For details, refer to the V2 API Migration Guide.
V1 API is limited to only support custom evaluation jobs.
Prerequisites#
Before following this deployment guide, ensure that you have:
Docker and Docker Compose installed on your system.
An NGC API Key for accessing NGC Catalog. Create an NGC API key following the instructions at Generating NGC API Keys. You need the NGC API Key to fetch Docker images required by the NeMo microservices platform.
Models must be deployed externally and accessible to the Evaluator or require a GPU and deploy NIM along with Docker Compose.
Set Up#
Export the NGC API Key into your shell environment using the following command:
export NGC_CLI_API_KEY=<your-ngc-api-key>
Setup the NGC CLI following the instructions at Getting Started with the NGC CLI. Make sure you run
ngc config setto setup required ngc configuration.Log in to NVIDIA NGC container registry:
docker login nvcr.io -u '$oauthtoken' -p $NGC_CLI_API_KEY
Download the Docker Compose configuration from NGC:
ngc registry resource download-version "nvidia/nemo-microservices/nemo-microservices-quickstart:25.10" cd nemo-microservices-quickstart_v25.10
Start NeMo Evaluator:
export NEMO_MICROSERVICES_IMAGE_REGISTRY=nvcr.io/nvidia/nemo-microservices export NEMO_MICROSERVICES_IMAGE_TAG=25.10 docker compose --profile evaluator up --detach --quiet-pull --wait
The command starts the Evaluator microservice on port
8080.(Optional) Deploy NIM:
If there is not a model deployed externally to the Docker Compose setup, you can deploy a NIM with the following Docker Compose file. Deploying a NIM requires 1 GPU of at least 40GB.
Save the following YAML to
nim.yamlfile to include in your Docker Compose up command.services: nim: image: ${NIM_IMAGE:-""} profiles: [evaluator] container_name: nim restart: on-failure ports: - ${NIM_PORT:-8000}:8000 environment: - NGC_API_KEY=${NGC_API_KEY:-${NGC_CLI_API_KEY:-""}} - NIM_GUIDED_DECODING_BACKEND=${NIM_GUIDED_DECODING_BACKEND:-""} runtime: nvidia volumes: - ${MODEL_CACHE}:/opt/nim/.cache networks: - nmp shm_size: 16GB user: root deploy: resources: reservations: devices: - driver: nvidia capabilities: [gpu] count: all healthcheck: test: [ "CMD", "python3", "-c", "import requests, sys; sys.exit(0 if requests.get('http://localhost:8000/v1/health/live').ok else 1)" ] interval: 10s timeout: 3s retries: 20 start_period: 60s # allow for 60 seconds to download a model and start up
The example will setup a local volume mount and deploy
meta/llama-3.2-3b-instruct.export NIM_PORT=8000 # configure host port for your deployment export MODEL_CACHE=$(pwd)/model-cache # specify the model cache location mkdir -p $MODEL_CACHE && chmod 1777 $MODEL_CACHE export NIM_IMAGE=nvcr.io/nim/meta/llama-3.2-3b-instruct:1.8.5 export NIM_GUIDED_DECODING_BACKEND=fast_outlines # decoding backend depends on the model docker compose \ -f docker-compose.yaml \ -f nim.yaml \ --profile evaluator up \ --detach --quiet-pull --wait
Service Endpoints#
After starting the services with Docker Compose, the following endpoints will be available by default:
Evaluator API: http://localhost:8080
This is the main endpoint for interacting with the Evaluator microservice.
Nemo Data Store HuggingFace Endpoint: http://localhost:3000/v1/hf
The Data Store exposes a HuggingFace-compatible API at this endpoint.
You can set the
HF_ENDPOINTenvironment variable to this URL if needed for integration or testing.
NIM URL:
http://nim:8000 is the model URL to use for Evaluation jobs.
The model ID must match the deployed NIM. Use
meta/llama-3.2-3b-instructas the model ID for the deployment example for Meta LLama-3.2-3b Instruct.
http://localhost:${NIM_PORT} is the model URL to use directly from the host.
Verify the Deployment#
After starting the services, verify everything is working:
Check service status:
docker compose ps
Verify the service is running:
curl -fv http://localhost:8080/v2/evaluation/jobs
(Optional) Verify the deployed NIM:
curl http://localhost:${NIM_PORT}/v1/models
Stop Evaluator#
To stop Evaluator and its related services, run the following command:
docker compose --profile evaluator down
This command stops and removes the containers started for the Evaluator and its dependencies. You can restart them at any time using the up command.
Include the flag -v or --volumes to remove persistent volumes.
docker compose --profile evaluator down -v
Next Steps#
For more tutorials, refer to Evaluation Tutorials.
For instructions on how to deploy the microservice on your Kubernetes cluster for production at scale, refer to Deploy NeMo Evaluator Using Helm Chart.