Deploy NeMo Evaluator with Docker#

Run the microservice on your local machine using Docker Compose for experimentation.

Note

v2 API Availability: Use the v2 API for jobs to run a larger set of evaluation flows. For details, refer to the V2 API Migration Guide.

V1 API is limited to only support custom evaluation jobs.


Prerequisites#

Before following this deployment guide, ensure that you have:


Set Up#

  1. Export the NGC API Key into your shell environment using the following command:

    export NGC_CLI_API_KEY=<your-ngc-api-key>
    
  2. Setup the NGC CLI following the instructions at Getting Started with the NGC CLI. Make sure you run ngc config set to setup required ngc configuration.

  3. Log in to NVIDIA NGC container registry:

    docker login nvcr.io -u '$oauthtoken' -p $NGC_CLI_API_KEY
    
  4. Download the Docker Compose configuration from NGC:

    ngc registry resource download-version "nvidia/nemo-microservices/nemo-microservices-quickstart:25.10"
    cd nemo-microservices-quickstart_v25.10
    
  5. Start NeMo Evaluator:

    export NEMO_MICROSERVICES_IMAGE_REGISTRY=nvcr.io/nvidia/nemo-microservices
    export NEMO_MICROSERVICES_IMAGE_TAG=25.10
    docker compose --profile evaluator up --detach --quiet-pull --wait
    

    The command starts the Evaluator microservice on port 8080.

  6. (Optional) Deploy NIM:

    If there is not a model deployed externally to the Docker Compose setup, you can deploy a NIM with the following Docker Compose file. Deploying a NIM requires 1 GPU of at least 40GB.

    Save the following YAML to nim.yaml file to include in your Docker Compose up command.

    services:
       nim:
          image: ${NIM_IMAGE:-""}
          profiles: [evaluator]
          container_name: nim
          restart: on-failure
          ports:
             - ${NIM_PORT:-8000}:8000
          environment:
             - NGC_API_KEY=${NGC_API_KEY:-${NGC_CLI_API_KEY:-""}}
             - NIM_GUIDED_DECODING_BACKEND=${NIM_GUIDED_DECODING_BACKEND:-""}
          runtime: nvidia
          volumes:
             - ${MODEL_CACHE}:/opt/nim/.cache
          networks:
             - nmp
          shm_size: 16GB
          user: root
          deploy:
             resources:
                reservations:
                   devices:
                      - driver: nvidia
                        capabilities: [gpu]
                        count: all
          healthcheck:
             test: [
                "CMD",
                "python3",
                "-c",
                "import requests, sys; sys.exit(0 if requests.get('http://localhost:8000/v1/health/live').ok else 1)"
             ]
             interval: 10s
             timeout: 3s
             retries: 20
             start_period: 60s # allow for 60 seconds to download a model and start up
    

    The example will setup a local volume mount and deploy meta/llama-3.2-3b-instruct.

    export NIM_PORT=8000 # configure host port for your deployment
    
    export MODEL_CACHE=$(pwd)/model-cache # specify the model cache location
    mkdir -p $MODEL_CACHE && chmod 1777 $MODEL_CACHE
    
    export NIM_IMAGE=nvcr.io/nim/meta/llama-3.2-3b-instruct:1.8.5
    export NIM_GUIDED_DECODING_BACKEND=fast_outlines # decoding backend depends on the model
    
    docker compose \
       -f docker-compose.yaml \
       -f nim.yaml \
       --profile evaluator up \
       --detach --quiet-pull --wait
    

Service Endpoints#

After starting the services with Docker Compose, the following endpoints will be available by default:

  • Evaluator API: http://localhost:8080

    • This is the main endpoint for interacting with the Evaluator microservice.

  • Nemo Data Store HuggingFace Endpoint: http://localhost:3000/v1/hf

    • The Data Store exposes a HuggingFace-compatible API at this endpoint.

    • You can set the HF_ENDPOINT environment variable to this URL if needed for integration or testing.

  • NIM URL:


Verify the Deployment#

After starting the services, verify everything is working:

  1. Check service status:

    docker compose ps
    
  2. Verify the service is running:

    curl -fv http://localhost:8080/v2/evaluation/jobs
    
  3. (Optional) Verify the deployed NIM:

    curl http://localhost:${NIM_PORT}/v1/models
    

Stop Evaluator#

To stop Evaluator and its related services, run the following command:

docker compose --profile evaluator down

This command stops and removes the containers started for the Evaluator and its dependencies. You can restart them at any time using the up command.

Include the flag -v or --volumes to remove persistent volumes.

docker compose --profile evaluator down -v

Next Steps#