Managing NeMo Evaluator#
About NeMo Evaluator#
NVIDIA NeMo Evaluator enables real-time evaluations of your LLM application through APIs. Using the NeMo Evaluator allows you to refine and optimize LLMs for enhanced performance and real-world applicability. The NeMo Evaluator APIs can be seamlessly automated within development pipelines, enabling faster iterations without the need for live data.
Read the NeMo Evaluator documentation for details on running evaluations of your LLMs.
Prerequisites#
All the common NeMo microservice prerequisites.
A NeMo Data Store and a NeMo Entity Store deployed on your cluster. The NeMo Entity Store and NeMo Data Store work closely together to hold information about the model entities on your cluster.
Note
You can use the NeMo Dependencies Ansible Playbook to deploy all the following NeMo Evaluator microservice dependencies.
Argo Workflows installed to orchestrate Evaluator jobs. Refer to the Argo Worflows Helm chart for details on installing. You must also create the required ServiceAccounts to run jobs.
Milvus vector database installed. This is used as a document store for Retriever and RAG pipelines evaluated with NeMo Evaluator. Refer to the Milvus Helm chart for details on installing.
Storage
A PostgreSQL database installed. This database is used as a persistent data store for the NeMo Evaluator.
Kubernetes
Create a datatbase user secret by creating a file such as
nemo-evaluator-secrets.yaml
, with contents like the following example:apiVersion: v1 stringData: password: <evalpass> kind: Secret metadata: name: evaluator-pg-existing-secret namespace: nemo type: Opaque
Apply the secret file.
$ kubectl apply -n nemo-evaluator -f nemo-evaluator-secrets.yaml
Deploying NeMo Evaluator#
Update the following sample scripts <inputs>
with values for your cluster configuration.
Create a file, such as
nemo-evaluator.yaml
, with contents like the following example:apiVersion: apps.nvidia.com/v1alpha1 kind: NemoEvaluator metadata: name: nemoevaluator-sample namespace: nemo spec: evaluationImages: bigcodeEvalHarness: "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-bigcode:0.12.13" lmEvalHarness: "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-lm-eval-harness:0.12.15" similarityMetrics: "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-custom-eval:0.12.13" llmAsJudge: "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-llm-as-a-judge:0.12.15" mtBench: "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-llm-as-a-judge:0.12.15" retriever: "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-retriever:0.12.13" rag: "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-rag:0.12.13" image: repository: nvcr.io/nvidia/nemo-microservices/evaluator tag: "25.04" pullPolicy: IfNotPresent pullSecrets: - ngc-secret expose: service: type: ClusterIP port: 8000 argoWorkflows: endpoint: https://<argo-workflows-server>.<nemo>.svc.cluster.local:2746 serviceAccount: <argo-workflows-executor> vectorDB: endpoint: http://<milvus>.<nemo>.svc.cluster.local:19530 datastore: endpoint: http://<nemodatastore-sample>.<nemo>.svc.cluster.local:8000/v1/hf entitystore: endpoint: http://<nemoentitystore-sample>.<nemo>.svc.cluster.local:8000 databaseConfig: host: <evaluator-pg-postgresql>.<nemo>.svc.cluster.local port: 5432 databaseName: <evaldb> credentials: user: <evaluser> secretName: <evaluator-pg-existing-secret> passwordKey: <password> otel: enabled: true exporterOtlpEndpoint: http://<evaluator-otel-opentelemetry-collector>.<nemo>.svc.cluster.local:4317 replicas: 1
Apply the manifest to deploy a NeMo Evaluator:
$ kubectl apply -n nemo -f nemo-evaluator.yaml
Tip
NeMo Evaluator image is large and it will take a few minutes to download from the registy.
Optional: View information about the NeMo microservice:
$ kubectl describe nemoevaluator.apps.nvidia.com -n nemo
Partial Output
... Conditions: Last Transition Time: 2024-08-12T19:09:43Z Message: Deployment is ready Reason: Ready Status: True Type: Ready Last Transition Time: 2024-08-12T19:09:43Z Message: Reason: Ready Status: False Type: Failed State: Ready
Verify NeMo Evaluator#
Once you have a NeMo Evalutor deployed on your cluster, use the steps below to verify the service is up and runnig.
Start a pod that has access to the
curl
command. Substitute any pod that has this command and meets your organization’s security requirements.$ kubectl run --rm -it -n default curl --image=curlimages/curl:latest -- ash
After the pod starts, you are connected to the
ash
shell in the pod.Connect to the NeMo Evaluator service
$ curl -X GET "http://nemoevaluator-sample.nemo:8000/v1/evaluation/configs"
Example Output
{"object":"list","data":[],"pagination":{"page":1,"page_size":10,"current_page_size":0,"total_pages":0,"total_results":0},"sort":"created_at"}~
Press Ctrl+D to exit and delete the pod.
Configure NeMo Evaluator#
The following table shows more information about the commonly modified fields for the NeMo Data Store custom resource.
Field |
Description |
Default Value |
---|---|---|
|
Specifies to add the user-supplied annotations to the pod. |
None |
|
Specifies the endpoint for the ArgoWorkflow. |
None |
|
Specifies the ServiceAccount for the ArgoWorkflow. |
None |
|
Specifies an ENUM for console logs. Values include |
None |
|
Specifies the name of the password key in the database credntials secret. |
|
|
Specifies the secret name of the secret for the database credentials secret. |
None |
|
Specifies the non-root database username. |
None |
|
Specifies the database name. |
None |
|
Specifies the hostname of the database. |
None |
|
Specifies the port where the database is reachable. |
|
|
Specifies NeMo Data Store endpoint. Read Managing NeMo Data Store for details on configuring and deploying a NeMo Data Store to your cluster. |
None |
|
When set to |
None |
|
Specifies NeMo Entity endpoint. Read Managing NeMo Entity Store for details on configuring and deploying a NeMo Entity Store to your cluster. |
None |
|
Specifies and environment variables. |
None |
|
Specifies the Evaluator log level. Values include |
None |
|
Specifies the required external images used for evaluation. Refer to the NeMo Evaluation Configurations documentation for details on these images. evaluationImages:
bigcodeEvalHarness: "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-bigcode:0.12.5"
lmEvalHarness: "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-lm-eval-harness:0.12.5"
similarityMetrics: "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-custom-eval:0.12.5"
llmAsJudge: "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-llm-as-a-judge:0.12.5"
mtBench: "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-llm-as-a-judge:0.12.5"
retriever: "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-retriever:0.12.5"
rag: "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-rag:0.12.5"
|
None |
|
Specifies attributes to expose a service for this NeMo microservice. |
None` |
|
When set to If you have an ingress controller, values like the following sample configures an ingress for the ingress:
enabled: true
spec:
ingressClassName: nginx
host: nemo-evaluator.example.com
paths:
- path: /
pathType: Prefix
|
|
|
Specifies the network port number for the NeMo Evaluator microservice. |
|
|
Specifies the Kubernetes service type to create for the NeMo microservice. |
|
|
Specifies the group for the pods.
This value is used to set the security context of the pod in the |
|
|
Specifies repository, tag, pull policy, and pull secret for the container image. |
None |
|
Specifies the user-supplied labels to add to the pod. |
None |
|
Specifies the log sink handlers. Values include |
None |
|
When set to |
|
|
When set to |
None |
|
Specifies URLs to be excluded from tracing. |
None |
|
Specifies the log exporter. Values include |
None |
|
Specifies the metrics exporter. Values include |
None |
|
Specifies the trace exporter. Values include |
None |
|
Specifies the OpenTelemetry Protocol endpoint. |
None |
|
Specifies the log level for OpenTelemetry. Values include |
None |
|
Specifies the number of replicas to have on the cluster. |
None |
|
Specifies the memory and CPU request. |
None |
|
Specifies the memory and CPU limits. |
None |
|
Specifies the tolerations for the pods. |
None |
|
Specifies the user ID for the pod.
This value is used to set the security context of the pod in the |
|
|
Specifies the vector database endpoint. |
None |
Next Steps#
Refer to the NeMo microservices documentation for details on