Managing NeMo Evaluator#

About NeMo Evaluator#

NVIDIA NeMo Evaluator enables real-time evaluations of your LLM application through APIs. Using the NeMo Evaluator allows you to refine and optimize LLMs for enhanced performance and real-world applicability. The NeMo Evaluator APIs can be seamlessly automated within development pipelines, enabling faster iterations without the need for live data.

Read the NeMo Evaluator documentation for details on running evaluations of your LLMs.

Prerequisites#

Note

You can use the NeMo Dependencies Ansible Playbook to deploy all the following NeMo Evaluator microservice dependencies.

  • Argo Workflows installed to orchestrate Evaluator jobs. Refer to the Argo Worflows Helm chart for details on installing. You must also create the required ServiceAccounts to run jobs.

  • Milvus vector database installed. This is used as a document store for Retriever and RAG pipelines evaluated with NeMo Evaluator. Refer to the Milvus Helm chart for details on installing.

Storage

  • A PostgreSQL database installed. This database is used as a persistent data store for the NeMo Evaluator.

Kubernetes

  • Create a datatbase user secret by creating a file such as nemo-evaluator-secrets.yaml, with contents like the following example:

    apiVersion: v1
    stringData:
      password: <evalpass>
    kind: Secret
    metadata:
      name: evaluator-pg-existing-secret
      namespace: nemo
    type: Opaque
    

    Apply the secret file.

    $ kubectl apply -n nemo-evaluator -f nemo-evaluator-secrets.yaml
    

Deploying NeMo Evaluator#

Update the following sample scripts <inputs> with values for your cluster configuration.

  1. Create a file, such as nemo-evaluator.yaml, with contents like the following example:

    apiVersion: apps.nvidia.com/v1alpha1
    kind: NemoEvaluator
    metadata:
      name: nemoevaluator-sample
      namespace: nemo
    spec:
      evaluationImages:
        bigcodeEvalHarness: "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-bigcode:0.12.13"
        lmEvalHarness: "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-lm-eval-harness:0.12.15"
        similarityMetrics: "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-custom-eval:0.12.13"
        llmAsJudge: "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-llm-as-a-judge:0.12.15"
        mtBench: "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-llm-as-a-judge:0.12.15"
        retriever: "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-retriever:0.12.13"
        rag: "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-rag:0.12.13"
      image:
        repository: nvcr.io/nvidia/nemo-microservices/evaluator
        tag: "25.04"
        pullPolicy: IfNotPresent
        pullSecrets:
          - ngc-secret
      expose:
        service:
          type: ClusterIP
          port: 8000
      argoWorkflows:
        endpoint: https://<argo-workflows-server>.<nemo>.svc.cluster.local:2746
        serviceAccount: <argo-workflows-executor>
      vectorDB:
        endpoint: http://<milvus>.<nemo>.svc.cluster.local:19530
      datastore:
        endpoint: http://<nemodatastore-sample>.<nemo>.svc.cluster.local:8000/v1/hf
      entitystore:
        endpoint: http://<nemoentitystore-sample>.<nemo>.svc.cluster.local:8000
      databaseConfig:
        host: <evaluator-pg-postgresql>.<nemo>.svc.cluster.local
        port: 5432
        databaseName: <evaldb>
        credentials:
          user: <evaluser>
          secretName: <evaluator-pg-existing-secret>
          passwordKey: <password>
      otel:
        enabled: true
        exporterOtlpEndpoint: http://<evaluator-otel-opentelemetry-collector>.<nemo>.svc.cluster.local:4317
      replicas: 1
    
  2. Apply the manifest to deploy a NeMo Evaluator:

    $ kubectl apply -n nemo -f nemo-evaluator.yaml
    

    Tip

    NeMo Evaluator image is large and it will take a few minutes to download from the registy.

  3. Optional: View information about the NeMo microservice:

    $ kubectl describe nemoevaluator.apps.nvidia.com -n nemo
    

    Partial Output

    ...
    Conditions:
     Last Transition Time:  2024-08-12T19:09:43Z
     Message:               Deployment is ready
     Reason:                Ready
     Status:                True
     Type:                  Ready
     Last Transition Time:  2024-08-12T19:09:43Z
     Message:
     Reason:                Ready
     Status:                False
     Type:                  Failed
    State:                  Ready
    

Verify NeMo Evaluator#

Once you have a NeMo Evalutor deployed on your cluster, use the steps below to verify the service is up and runnig.

  1. Start a pod that has access to the curl command. Substitute any pod that has this command and meets your organization’s security requirements.

    $ kubectl run --rm -it -n default curl --image=curlimages/curl:latest -- ash
    

    After the pod starts, you are connected to the ash shell in the pod.

  2. Connect to the NeMo Evaluator service

    $ curl -X GET "http://nemoevaluator-sample.nemo:8000/v1/evaluation/configs"
    

    Example Output

    {"object":"list","data":[],"pagination":{"page":1,"page_size":10,"current_page_size":0,"total_pages":0,"total_results":0},"sort":"created_at"}~ 
    
  3. Press Ctrl+D to exit and delete the pod.

Configure NeMo Evaluator#

The following table shows more information about the commonly modified fields for the NeMo Data Store custom resource.

Field

Description

Default Value

spec.annotations

Specifies to add the user-supplied annotations to the pod.

None

spec.argoWorkFlows.endpoint (required)

Specifies the endpoint for the ArgoWorkflow.

None

spec.argoWorkFlows.serivceAccount (required)

Specifies the ServiceAccount for the ArgoWorkflow.

None

spec.consoleLogLevel

Specifies an ENUM for console logs. Values include INFO, DEBUG.

None

spec.databaseConfig.credentials.passwordKey (required)

Specifies the name of the password key in the database credntials secret.

password

spec.databaseConfig.credentials.secretName (required)

Specifies the secret name of the secret for the database credentials secret.

None

spec.databaseConfig.credentials.user (required)

Specifies the non-root database username.

None

spec.databaseConfig.databaseName (required)

Specifies the database name.

None

spec.databaseConfig.host (required)

Specifies the hostname of the database.

None

spec.databaseConfig.port (required)

Specifies the port where the database is reachable.

5432

spec.datastore.endpoint (required)

Specifies NeMo Data Store endpoint. Read Managing NeMo Data Store for details on configuring and deploying a NeMo Data Store to your cluster.

None

spec.enableValidation

When set to true, the NeMo Evaluator validation jobs are run.

None

spec.entity.endpoint (required)

Specifies NeMo Entity endpoint. Read Managing NeMo Entity Store for details on configuring and deploying a NeMo Entity Store to your cluster.

None

spec.env

Specifies and environment variables.

None

spec.evalLogLevel

Specifies the Evaluator log level. Values include INFO and DEBUG.

None

spec.evaluationImages (required)

Specifies the required external images used for evaluation. Refer to the NeMo Evaluation Configurations documentation for details on these images.

evaluationImages:
  bigcodeEvalHarness: "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-bigcode:0.12.5"
  lmEvalHarness: "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-lm-eval-harness:0.12.5"
  similarityMetrics: "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-custom-eval:0.12.5"
  llmAsJudge: "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-llm-as-a-judge:0.12.5"
  mtBench: "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-llm-as-a-judge:0.12.5"
  retriever: "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-retriever:0.12.5"
  rag: "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-rag:0.12.5"

None

spec.expose

Specifies attributes to expose a service for this NeMo microservice.

None`

spec.expose.ingress.enabled

When set to true, the Operator creates a Kubernetes Ingress resource for the NeMo Evaluator. Specify the ingress specification in the spec.expose.ingress.spec field.

If you have an ingress controller, values like the following sample configures an ingress for the / endpoint.

ingress:
  enabled: true
  spec:
    ingressClassName: nginx
    host: nemo-evaluator.example.com
    paths:
      - path: /
        pathType: Prefix

false

spec.expose.service.port

Specifies the network port number for the NeMo Evaluator microservice.

8000

spec.expose.service.type

Specifies the Kubernetes service type to create for the NeMo microservice.

ClusterIP

spec.groupID

Specifies the group for the pods. This value is used to set the security context of the pod in the runAsGroup and fsGroup fields.

2000

spec.image

Specifies repository, tag, pull policy, and pull secret for the container image.

None

spec.labels

Specifies the user-supplied labels to add to the pod.

None

spec.logHandlers

Specifies the log sink handlers. Values include console or file.

None

spec.metrics.enabled

When set to true, the Operator configures a Prometheus service monitor for the service. Specify the service monitor specification in the spec.metrics.serviceMonitor field. Refer to the Observability page for more details.

false

spec.otel.disableLogging

When set to true, OpenTelemetry collector and tracing are enabled.

None

spec.otel.excludeUrls

Specifies URLs to be excluded from tracing.

None

spec.otel.exporterConfig.logsExporter

Specifies the log exporter. Values include otlp, console, none.

None

spec.otel.exporterConfig.metricsExporter

Specifies the metrics exporter. Values include otlp, console, none.

None

spec.otel.exporterConfig.traceExporter

Specifies the trace exporter. Values include otlp, console, none.

None

spec.otel.OtlpEndpoint

Specifies the OpenTelemetry Protocol endpoint.

None

spec.otel.logLevel

Specifies the log level for OpenTelemetry. Values include INFO and DEBUG.

None

spec.replicas

Specifies the number of replicas to have on the cluster.

None

spec.resources.requests

Specifies the memory and CPU request.

None

spec.resources.limits

Specifies the memory and CPU limits.

None

spec.tolerations

Specifies the tolerations for the pods.

None

spec.userID

Specifies the user ID for the pod. This value is used to set the security context of the pod in the runAsUser fields.

1000

spec.vectorDB.endpoint (required)

Specifies the vector database endpoint.

None

Next Steps#