Managing NeMo Evaluator#

About NeMo Evaluator#

NVIDIA NeMo Evaluator enables real-time evaluations of your LLM application through APIs. Using the NeMo Evaluator allows you to refine and optimize LLMs for enhanced performance and real-world applicability. The NeMo Evaluator APIs can be seamlessly automated within development pipelines, enabling faster iterations without the need for live data.

Read the NeMo Evaluator documentation for details on running evaluations of your LLMs.

Prerequisites#

All the common NeMo microservice prerequisites.
A NeMo Data Store and a NeMo Entity Store deployed on your cluster. The NeMo Entity Store and NeMo Data Store work closely together to hold information about the model entities on your cluster.

Note

You can use the NeMo Dependencies Ansible Playbook to deploy all the following NeMo Evaluator microservice dependencies.

Argo Workflows installed to orchestrate Evaluator jobs. Refer to the Argo Worflows Helm chart for details on installing. You must also create the required ServiceAccounts to run jobs.
Milvus vector database installed. This is used as a document store for Retriever and RAG pipelines evaluated with NeMo Evaluator. Refer to the Milvus Helm chart for details on installing.

Storage

A PostgreSQL database installed. This database is used as a persistent data store for the NeMo Evaluator.

Kubernetes

Create a datatbase user secret by creating a file such as nemo-evaluator-secrets.yaml, with contents like the following example:

apiVersion: v1
stringData:
  password: <evalpass>
kind: Secret
metadata:
  name: evaluator-pg-existing-secret
  namespace: nemo
type: Opaque

Apply the secret file.

$ kubectl apply -n nemo-evaluator -f nemo-evaluator-secrets.yaml

Deploying NeMo Evaluator#

Update the following sample scripts <inputs> with values for your cluster configuration.

Create a file, such as nemo-evaluator.yaml, with contents like the following example:

apiVersion: apps.nvidia.com/v1alpha1
kind: NemoEvaluator
metadata:
  name: nemoevaluator-sample
  namespace: nemo
spec:
  evaluationImages:
    bigcodeEvalHarness: "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-bigcode:0.12.21"
    lmEvalHarness: "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-lm-eval-harness:0.12.21"
    similarityMetrics: "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-custom-eval:0.12.21"
    llmAsJudge: "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-llm-as-a-judge:0.12.21"
    mtBench: "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-llm-as-a-judge:0.12.21"
    retriever: "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-retriever:0.12.21"
    rag: "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-rag:0.12.21"
    bfcl: "nvcr.io/nvidia/nemo-microservices/eval-factory-benchmark-bfcl:25.6.1"
    agenticEval: "nvcr.io/nvidia/nemo-microservices/eval-factory-benchmark-agentic-eval:25.6.1"
  image:
    repository: nvcr.io/nvidia/nemo-microservices/evaluator
    tag: "25.06"
    pullPolicy: IfNotPresent
    pullSecrets:
      - ngc-secret
  expose:
    service:
      type: ClusterIP
      port: 8000
  argoWorkflows:
    endpoint: https://<argo-workflows-server>.<nemo>.svc.cluster.local:2746
    serviceAccount: <argo-workflows-executor>
  vectorDB:
    endpoint: http://<milvus>.<nemo>.svc.cluster.local:19530
  datastore:
    endpoint: http://<nemodatastore-sample>.<nemo>.svc.cluster.local:8000/v1/hf
  entitystore:
    endpoint: http://<nemoentitystore-sample>.<nemo>.svc.cluster.local:8000
  databaseConfig:
    host: <evaluator-pg-postgresql>.<nemo>.svc.cluster.local
    port: 5432
    databaseName: <evaldb>
    credentials:
      user: <evaluser>
      secretName: <evaluator-pg-existing-secret>
      passwordKey: <password>
  otel:
    enabled: true
    exporterOtlpEndpoint: http://<evaluator-otel-opentelemetry-collector>.<nemo>.svc.cluster.local:4317
  replicas: 1

Apply the manifest to deploy a NeMo Evaluator:
```
$ kubectl apply -n nemo -f nemo-evaluator.yaml
```
Tip

NeMo Evaluator image is large and it will take a few minutes to download from the registy.

Optional: View information about the NeMo microservice:

$ kubectl describe nemoevaluator.apps.nvidia.com -n nemo

Partial Output

...
Conditions:
 Last Transition Time:  2024-08-12T19:09:43Z
 Message:               Deployment is ready
 Reason:                Ready
 Status:                True
 Type:                  Ready
 Last Transition Time:  2024-08-12T19:09:43Z
 Message:
 Reason:                Ready
 Status:                False
 Type:                  Failed
State:                  Ready

Verify NeMo Evaluator#

Once you have a NeMo Evalutor deployed on your cluster, use the steps below to verify the service is up and runnig.

Start a pod that has access to the curl command. Substitute any pod that has this command and meets your organization’s security requirements.
```
$ kubectl run --rm -it -n default curl --image=curlimages/curl:latest -- ash
```
After the pod starts, you are connected to the ash shell in the pod.

Connect to the NeMo Evaluator service

$ curl -X GET "http://nemoevaluator-sample.nemo:8000/v1/evaluation/configs"

Example Output

{"object":"list","data":[],"pagination":{"page":1,"page_size":10,"current_page_size":0,"total_pages":0,"total_results":0},"sort":"created_at"}~ 

Press Ctrl+D to exit and delete the pod.

Configure NeMo Evaluator#

The following table shows more information about the commonly modified fields for the NeMo Data Store custom resource.

Field	Description	Default Value
`spec.annotations`	Specifies to add the user-supplied annotations to the pod.	None
`spec.argoWorkFlows.endpoint` (required)	Specifies the endpoint for the ArgoWorkflow.	None
`spec.argoWorkFlows.serivceAccount` (required)	Specifies the ServiceAccount for the ArgoWorkflow.	None
`spec.consoleLogLevel`	Specifies an ENUM for console logs. Values include `INFO`, `DEBUG`.	None
`spec.databaseConfig.credentials.passwordKey` (required)	Specifies the name of the password key in the database credntials secret.	`password`
`spec.databaseConfig.credentials.secretName` (required)	Specifies the secret name of the secret for the database credentials secret.	None
`spec.databaseConfig.credentials.user` (required)	Specifies the non-root database username.	None
`spec.databaseConfig.databaseName` (required)	Specifies the database name.	None
`spec.databaseConfig.host` (required)	Specifies the hostname of the database.	None
`spec.databaseConfig.port` (required)	Specifies the port where the database is reachable.	`5432`
`spec.datastore.endpoint` (required)	Specifies NeMo Data Store endpoint. Read Managing NeMo Data Store for details on configuring and deploying a NeMo Data Store to your cluster.	None
`spec.enableValidation`	When set to `true`, the NeMo Evaluator validation jobs are run.	None
`spec.entity.endpoint` (required)	Specifies NeMo Entity endpoint. Read Managing NeMo Entity Store for details on configuring and deploying a NeMo Entity Store to your cluster.	None
`spec.env`	Specifies and environment variables.	None
`spec.evalLogLevel`	Specifies the Evaluator log level. Values include `INFO` and `DEBUG`.	None
`spec.evaluationImages` (required)	Specifies the external images used for evaluation. Refer to the NeMo Evaluation Configurations documentation for details on these images. evaluationImages: bigcodeEvalHarness: "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-bigcode:0.12.5" lmEvalHarness: "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-lm-eval-harness:0.12.5" similarityMetrics: "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-custom-eval:0.12.5" llmAsJudge: "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-llm-as-a-judge:0.12.5" mtBench: "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-llm-as-a-judge:0.12.5" retriever: "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-retriever:0.12.5" rag: "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-rag:0.12.5"	None
`spec.expose`	Specifies attributes to expose a service for this NeMo microservice.	None`
`spec.expose.ingress.enabled`	When set to `true`, the Operator creates a Kubernetes Ingress resource for the NeMo Evaluator. Specify the ingress specification in the `spec.expose.ingress.spec` field. If you have an ingress controller, values like the following sample configures an ingress for the `/` endpoint. ingress: enabled: true spec: ingressClassName: nginx host: nemo-evaluator.example.com paths: - path: / pathType: Prefix	`false`
`spec.expose.service.port`	Specifies the network port number for the NeMo Evaluator microservice.	`8000`
`spec.expose.service.type`	Specifies the Kubernetes service type to create for the NeMo microservice.	`ClusterIP`
`spec.groupID`	Specifies the group for the pods. This value is used to set the security context of the pod in the `runAsGroup` and `fsGroup` fields.	`2000`
`spec.image`	Specifies repository, tag, pull policy, and pull secret for the container image.	None
`spec.labels`	Specifies the user-supplied labels to add to the pod.	None
`spec.logHandlers`	Specifies the log sink handlers. Values include `console` or `file`.	None
`spec.metrics.enabled`	When set to `true`, the Operator configures a Prometheus service monitor for the service. Specify the service monitor specification in the `spec.metrics.serviceMonitor` field. Refer to the Observability page for more details.	`false`
`spec.otel.disableLogging`	When set to `true`, OpenTelemetry collector and tracing are enabled.	None
`spec.otel.excludeUrls`	Specifies URLs to be excluded from tracing.	None
`spec.otel.exporterConfig.logsExporter`	Specifies the log exporter. Values include `otlp`, `console`, `none`.	None
`spec.otel.exporterConfig.metricsExporter`	Specifies the metrics exporter. Values include `otlp`, `console`, `none`.	None
`spec.otel.exporterConfig.traceExporter`	Specifies the trace exporter. Values include `otlp`, `console`, `none`.	None
`spec.otel.OtlpEndpoint`	Specifies the OpenTelemetry Protocol endpoint.	None
`spec.otel.logLevel`	Specifies the log level for OpenTelemetry. Values include `INFO` and `DEBUG`.	None
`spec.replicas`	Specifies the number of replicas to have on the cluster.	None
`spec.resources.requests`	Specifies the memory and CPU request.	None
`spec.resources.limits`	Specifies the memory and CPU limits.	None
`spec.tolerations`	Specifies the tolerations for the pods.	None
`spec.userID`	Specifies the user ID for the pod. This value is used to set the security context of the pod in the `runAsUser` fields.	`1000`
`spec.vectorDB.endpoint` (required)	Specifies the vector database endpoint.	None

Next Steps#

Managing NeMo Guardrails
Managing NeMo Customizer
Refer to the NeMo microservices documentation for details on