Troubleshooting NeMo Evaluator#
Use this documentation to troubleshoot issues that can arise when you work with NVIDIA NeMo Evaluator.
Tip
You can get logs for COMPLETED or FAILED jobs and use them to help troubleshoot.
Unsupported Judge Model#
LLM-as-a-Judge evaluates the quality of another model’s output using an evaluation prompt and an evaluation criteria. The prompt applies structure to the judge’s output which is then parsed by the evaluation criteria to generate a metrics score.
Not all models make good judges. If the judge produces inconsistent output and does not follow the format expected by the evaluation criteria, the evaluation can fail with parsing errors. This is commonly observed for smaller models.
Incoming request               body={'messages': [{'content': 'The output string did not satisfy the constraints given in the prompt. Fix the output string and return it.\nPlease return the output in a JSON format that complies with the following schema as specified in JSON Schema:\n{"properties": {"text": {"title": "Text", "type": "string"}}, "required": ["text"], "title": "StringIO", "type": "object"}
Dataset {dataset} is not in the expected format; it needs to have the files_url property set#
This means that either the files_url  is not provided as part of the dataset specification in the config,
or that the files_url is not provided in the expected format.
The dataset must be a JSON object with the files_url property set,
pointing to the path of the file in the NeMo Data Store in the format: hf://datasets/<dataset-namespace>/<dataset-namespace>/<file-path>.
Error connecting to inference server#
This means that for a custom evaluation, the target LLM endpoint is unable to connect.
Error occurred while checking the existence of file {file_ref} on NeMo Data Store#
This could mean that the dataset is not specified correctly, or that the NeMo Data Store itself is unresponsive.
- Verify that the files URL is correct and that the dataset and file exists in the NeMo Data Store. 
- Verify that the NeMo Data Store is responsive and reachable. 
If the error contains the string Dataset {file_ref} is not present on datastore,
it means that the datastore is responsive, but the file reference does not exist.
Evaluation Job Takes a Long Time#
The time that an Evaluation job takes can vary from a few minutes to many hours,
depending on the target model, config, and other factors.
As long as the status is running, your job is still running.
If there is a problem with your job, you will see unavailable or failed.
For more information, see Expected Evaluation Duration.
Invalid parameters specified for filter#
This error means that there is an invalid parameter while running a GET request with a filter.
For the parameters that you can use to filter queries, refer to Filter and Sort Responses from the NVIDIA NeMo Evaluator API.
Job cannot be launched#
This means that one of the pre-launch validations has failed. The error contains the details about the checks that failed.
Missing required environment variable#
This error means that a required environment variable is not set correctly.
For example, DATA_STORE_URL is not set in the NeMo Evaluator deployment.
Unable to launch the evaluation job because there is a problem in the target, config, or environment#
This error appears in the status message of the job. It means that the evaluation config does not contain all the required parameters. Refer to the evaluation config documentation for examples.
It might also mean that the target is not correctly set. Either the target is not reachable, or its parameters, for example, model_id, are not correctly set. Refer to the evaluation target documentation for examples.
Unsupported metric type#
An unsupported metric was provided for custom evaluation.
What is EVALUATOR_BASE_URL?#
EVALUATOR_BASE_URL is a placeholder for the URL of the evaluator API,
that is used in the examples in the documentation.
The URL of the evaluator API depends on where you deploy evaluator and how you configure it.
For example, your evaluator API URL might look like evaluator.internal.your-company.com.
- To install Evaluator in a Kubernetes minikube environment, see Demo Cluster Setup on Minikube. 
- To install Evaluator in a deployment environment, see NeMo Evaluator Deployment Guide. 
If you are running Evaluator in a local Kubernetes minikube, be sure to enable ingress by using the following code.
minikube addons enable ingress
You can also port-forward the evaluator service endpoint by using the following code.
After you port-forward, the EVALUATOR_BASE_URL would be localhost:7331.
kubectl port-forward svc/nemo-evaluator 7331:7331
nemo-evaluator is the default service name.
You can verify the service name be using the following code in the namespace where the deployment is.
kubectl get svc  
Advanced Troubleshooting#
To troubleshoot an evaluation job that has failed, you can check the evaluation job logs and pod logs.
Warning
These are advanced troubleshooting steps that should only be done after all other troubleshooting fails.
Prerequisites#
Before you can use these steps, you need the following:
- Already tried all other troubleshooting steps in this documentation 
- Basic knowledge of Kubernetes 
- Access to the Kubernetes cluster where the service is deployed 
- Install Kubectl and configure it to access the K8s cluster where the service is deployed 
Evaluation Job Logs#
To download the log files, use the download-results endpoint.
This endpoint downloads the result directory containing configuration files, logs, and evaluation results for a specific evaluation run.
The result directory is packaged and provided as a downloadable archive.
To download the evaluation results directory, use the following code.
curl -X 'GET' \
  '<BASE_URL>/v1/evaluation/jobs/<job-id>/download-results' \
  -H 'accept: application/json' \
  -o result.zip
After the download is complete, the log files are available inside the result.zip file.
Log files can be found in the results folder with the file extension *.log.
Skip validation checks#
When you launch an evaluation job, NeMo Evaluator performs availability checks (for example, checking if the dataset and files exist in NeMo Data Store).
To speed up job launch, or due to strict constraints of validation checks, you can pass the query parameter skip_validation_checks during job launch.
Use the following code to create an evaluation job that skips validation checks.
curl -X 'POST' \
  'https://${EVALUATOR_BASE_URL}/v1/evaluation/jobs?skip_validation_checks=True' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "namespace": "my-organization",
    "target": "<my-target-namespace/my-target-name>",
    "config": "<my-config-namespace/my-config-name>"
}'
data = {
   "namespace": "my-organization",
   "target": "<my-target-namespace/my-target-name>",
   "config": "<my-config-namespace/my-config-name>"
}
endpoint = f"{EVALUATOR_BASE_URL}/v1/evaluation/jobs?skip_validation_checks=True"
response = requests.post(endpoint, json=data).json()