Troubleshooting NeMo Auditor#
Use this documentation to troubleshoot issues that can arise when you run audit jobs with NVIDIA NeMo Auditor.
Troubleshooting Audit Jobs#
The first step for troubleshooting audit jobs is to check for audit job progress.
import os from nemo_microservices import NeMoMicroservices client = NeMoMicroservices(base_url=os.getenv("AUDITOR_BASE_URL")) status = client.beta.audit.jobs.get_status(job_id) print(status)
curl "${AUDITOR_BASE_URL}/v1beta1/audit/jobs/${JOB_ID}/status" \ -H "Accept: application/json" | jq
In the following example output, the job is active, but the completed probes value is
0
.AuditJobStatus(message=None, progress=Progress(probes_complete=0, probes_total=22), status='ACTIVE')
{ "status": "ACTIVE", "message": null, "progress": { "probes_total": 22, "probes_complete": 0 } }
If the message in the job status response does not indicate an error, the next step is to check the logs.
client = NeMoMicroservices(base_url=os.getenv("AUDITOR_BASE_URL")) logs = client.beta.audit.jobs.get_logs(job_id) print("\n".join(logs.split("\n")[-10:]))
curl "${AUDITOR_BASE_URL}/v1beta1/audit/jobs/${JOB_ID}/logs" \ -H "Accept: text/plain" | tail -n 10
The logs often report 404 or 429 HTTP status codes. For these status codes, check the URI for the job target.
Another trouble scenario is that the target model is unresponsive and connections eventually time out:
2025-09-02 14:19:22,928 DEBUG response_closed.complete 2025-09-02 14:19:22,924 DEBUG Encountered httpx.TimeoutException Traceback (most recent call last): File "/app/.venv/lib/python3.11/site-packages/httpx/_transports/default.py", line 101, in map_httpcore_exceptions yield ...
If the issue is still unclear, get the target for the job and the try to run inference with the target model.