NeMo Evaluator Deployment Guide#

You can deploy NVIDIA NeMo Evaluator by using Helm. To use Helm, you must have a Kubernetes cluster installed and Helm ready to use.

Prerequisites#

Dependencies

NIM for LLMs
NeMo Data Store
Argo Workflows (Evaluation jobs are orchestrated by Argo workflows.)
Milvus (Used for Retriever and RAG evaluations.)
PostgreSQL (The persistent data store for NeMo Evaluator.)

For Argo Workflows, Milvus, and Postgres, you can install with the default values from the NeMo Evaluator Helm chart, or you can use your own versions.

Kubernetes

Secrets:
- NGC Image pull credentials

Install NIM for LLMs#

Use the following documentation to install NIM for LLMs. The service must be running before you can use NeMo Evaluator.

NIM for LLMs Deploy with Helm

The inference URL for each evaluation job is specified at the API request level. Make sure the target model is running with NIM for LLMs before you create the evaluation target and submit an evaluation job for it.

Install and Configure NeMo Data Store#

Use the following documentation to install NeMo Data Store. The service must be running before you can use NeMo Evaluator.

NeMo Data Store Microservice Deployment and Setup Guide

In the custom-values.yaml file, specify the data store endpoint to the evaluator.external.dataStore.endpoint key.

external:
  dataStore:
    # external.dataStore.endpoint references an endpoint URL of an external to the chart installation of NeMo Data Store
    endpoint: "http://<nemo-data-store-service>.<nemo-data-store-namespace>.svc.cluster.local:8000"

Configure Argo Workflows#

NeMo Evaluator orchestrates evaluation jobs using Argo workflows. You can install the Argo Workflows from the NeMo Evaluator Helm chart, or you can use your own Argo Workflows.

Install Argo Workflows with the Helm chart#

By default, the NeMo Evaluator Helm chart installs the Argo Workflows using the Argo Workflows Helm Chart. The following code snippet shows the default Argo Workflows configured in the NeMo Evaluator Helm chart with argoWorkflows.enabled set to true and the following pre-configured parameters:

crds: Install the CRDs to run an Argo Workflows server. There are two options: set it to true to install from Helm, or set it to false and manually install it for the first time. Refer to the custom resource definition section in the Argo Workflows documentation for more information.
argoServiceAccount: Create a service account to execute the workflow.
argoWorkflows.server.authModes: Set the authentication mode to server.

argoWorkflows:
  enabled: true
  serviceName: argo-workflows-server
  server:
    authModes:
      - "server"
    servicePort: 2746
    secure: true
  crds:
    install: false

external:
  argoWorkflows:
    endpoint: ""

argoServiceAccount:
  create: true
  name: workflow-executor

Warning

Due to a known issue of certain Argo Workflow CRD’s installed at a Cluster scope, Argo Workflows can only be installed once and at a cluster level. This limits re-installation and installations of Argo Workflows within a namespace when using the NeMo Evaluator Helm chart. Therefore, we strongly recommended that you choose the setup option with the pre-installed Argo Workflows, as described in the following section.

Use Your Own Argo Workflows#

To connect to your own pre-installed Argo Workflows, in custom-values.yaml set argoWorkflows.enabled to false, and configure the endpoint details.

argoWorkflows:
  enabled: false

external:
  argoWorkflows:
    endpoint: "<url to the pre-installed argo workflows>"

Configure Milvus#

Milvus vector database is used as a document store for Retriever and RAG pipelines evaluated with NeMo Evaluator. You can install Milvus with the NeMo Evaluator Helm chart, or you can use your own Milvus.

Install Milvus with the Helm chart#

To install Milvus with the default settings from the NeMo Evaluator Helm chart, in custom-values.yaml set milvus.enabled to true.

Warning

The default Milvus installation in the NeMo Evaluator Helm chart is using the Milvus Helm Chart. For production evaluations, connect to your own external Milvus (version 2.3.4 or later). Refer to Use Your Own Milvus for more information.

milvus:
  enabled: true
  serviceName: milvus
  cluster:
    enabled: false
  etcd:
    enabled: false
  pulsar:
    enabled: false
  minio:
    enabled: false
    tls:
      enabled: false
  standalone:
    persistence:
      enabled: true
      persistentVolumeClaim:
        size: 100Gi
        storageClass: standard
    extraEnv:
      - name: LOG_LEVEL
        value: error
  extraConfigFiles:
    user.yaml: |+
      etcd:
        use:
          embed: true
        data:
          dir: /var/lib/milvus/etcd
      common:
        storageType: local
   

Tip

For Retriever and RAG evaluations with large datasets, such as hotpotqa, we recommend that you set the storage size to at least 100Gi.

Use Your Own Milvus#

To connect to your own externally installed Milvus, in custom-values.yaml set milvus.enabled to false, and configure the endpoint.

milvus:
  enabled: false

external:
  milvus:
    endpoint: "<url to the pre-installed milvus>"      

Configure with External PostgreSQL#

By default, the NeMo Entity Store Helm chart uses the Bitnami PostgreSQL chart to deploy a PostgreSQL database. Refer to the PostgreSQL section for information on how to configure the microservice with an external PostgreSQL database.

Port-forward the NeMo Evaluator Microservice#

You can verify the installation by launching an evaluation job. Use the following procedure.

Port-forward the microservice (adjust based on the release name) to your local machine:
```
kubectl -n <NAMESPACE> port-forward service/myrelease-nemo-evaluator 7331:7331
```

After port-forwarding, your data scientists can use the local host URL to access the NeMo Evaluator microservice APIs.

Monitor Your Installation#

NeMo Evaluator is auto-instrumented with the OpenTelemetry SDK. It can export traces, metrics, and logs in an OPenTelemetry standard OTLP format.

By default NeMo Evaluator is configured with exporters disabled. Only logs are being printed into console in this mode.

To enable OpenTelemetry exporters

Set otelExporterEnabled to true in the Helm chart.
Configure OpenTelemetry by adding standard OTel environment variables into otelEnvVars, as shown in the following example.

otelEnvVars
  OTEL_EXPORTER_OTLP_ENDPOINT: "http://<otel-collector>:4317" # OTLP endpoint where the exporters will be sending telemetry to
  OTEL_SERVICE_NAME: "nemo-evaluator" # name of the service associated with the telemetry records
  OTEL_TRACES_EXPORTER: otlp # traces exporter, set to "none" if not needed
  OTEL_METRICS_EXPORTER: otlp # metrics exporter, set to "none" if not needed
  OTEL_LOGS_EXPORTER: otlp # logs exporter, set to "none" if not needed
  OTEL_PROPAGATORS: "tracecontext,baggage" # propagators configuration for tracing
  OTEL_RESOURCE_ATTRIBUTES: "deployment.environment=$(NAMESPACE)" # additional OTel record attributes
  OTEL_PYTHON_EXCLUDED_URLS: "health" # URLs that are excluded and will not be exporting any telemetry
  OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED: "true" # setting enabling auto-instrumentation for logging

You can configure NeMo Evaluator to export telemetry to your own pre-installed OpenTelemetry Collector or use the collector that is included with the NeMo Evaluator Helm chart. Set opentelemetry-collector.enabled to true to enable installing an OpenTelemetry Collector within the Helm chart. In this mode OTEL_EXPORTER_OTLP_ENDPOINT is automatically set to the OpenTelemetry Collector endpoint. Set zipkin.enabled to true to enable installing Zipkin (UI for tracing) within the Helm chart.

Values#

Name	Description	Value
`zipkin.enabled`	Specify whether this chart deploys zipkin for metrics.	`false`
`opentelemetry-collector.enabled`	Specify whether this chart deploys OpenTelemetry Collector for metrics.	`false`
`opentelemetry-collector.config`	OpenTelemetry Collector configurations.	`{{}}`
`otelExporterEnabled`	Enable OpenTelemetry exporters for NeMo Evaluator.	`false`
`otelEnvVars`	Env variables to configure OpenTelemetry for NeMo Evaluator, sane defaults in chart.	`{{}}`
`logLevel`	Log level for both OTLP and console exporters.	`INFO`
`opentelemetry-collector.config`	OpenTelemetry Collector configurations. Refer to Open Telemetry Setup for details	`{{}}`