Troubleshooting#

Problem: 400 Bad Request

  • Cause: Invalid video format, unsupported codec, corrupted video data, or invalid base64 encoding

  • Action:

    • Verify the video format is supported (MP4, AVI, MOV, MKV, WebM, FLV, WMV, 3GP, M4V).

    • Check that the video isn’t corrupted or malformed.

    • Ensure the base64 encoding is valid for the data:video/*;base64, format.

    • Verify the video dimensions are ≤ 2048x2048 pixels.

Problem: 401 Unauthorized

  • Cause: Missing or invalid authentication token

  • Action: Ensure a proper Bearer token is included in Authorization header.

Problem: 422 Validation Failed

  • Cause: Request doesn’t match schema requirements

  • Action:

    • Ensure your JSON payload matches the schema (i.e. correct request_type).

    • Verify the input array has ≤ 64 items for bulk operations.

    • Check that bulk_text requests contain only text, with no video data.

    • Validate the required fields: input, model, request_type.

Problem: 413 Payload Too Large

  • Cause: Video file or request payload exceeds size limits

  • Action:

    • Reduce the video file size or duration to the recommended limits.

    • Use presigned URLs for large videos instead of base64 encoding.

    • Split large batch requests into smaller chunks.

Problem: 500 Internal Server Error

  • Cause: General server errors, GPU memory exhaustion, or model execution failures

  • Action:

    • Check container logs using docker logs <container> for stack trace.

    • Look for GPU memory errors in logs.

    • Verify CUDA is available and functioning

    • Check if the error is recoverable–look for the X-Error-Classification header.

Problem: 503 Service Unavailable

  • Cause: Service temporarily unavailable due to GPU issues, connection problems, or automatic restarts

  • Action:

    • Wait for automatic service recovery–check the Retry-After header.

    • Verify the Triton inference server is running internally.

    • Check GPU memory availability.

    • Monitor service health endpoints at /v1/health/ready.

Problem: Slow or stalled video processing

  • Action:

    • Verify the input video length fits the recommended limits and is in a supported format. Refer to the supported formats section for more details.

    • Verify that the video resolution is ≤ 2048x2048 pixels.

    • Monitor GPU memory usage for out-of-memory (OOM) conditions.

    • Use request_type: "query" for single videos to optimize latency.

Problem: “No frames found” or “Invalid video file”

  • Action:

    • Verify that the video file isn’t corrupted.

    • Check that the video has actual video frames (not audio-only).

    • Verify that the video is in the correct format (MP4/H.264 recommended). Refer to the supported formats section for more details.

    • Test with a known working video file.

Problem: Base64 video decode errors

  • Action:

    • Verify base64 string is properly formatted: data:video/mp4;base64,<data>

    • Verify that the base64 encoding is valid by testing with an online decoder.

    • Ensure that there are no line breaks or spaces in the base64 string.

Problem: Presigned URL concerns (security or access)

  • Notes:

    • URL expiry or scope issues can result in 403/404 responses.

    • Bucket/object policy (ACL) may deny access.

    • Network egress from the NIM host to the storage endpoint may be blocked.

    • Non-HTTPS URLs or long-lived credentials in URLs increase risk.

Problem: Exceeded maximum video length

  • Cause: Very long videos increase latency, memory, or failure likelihood.

  • Action:

    • Trim videos to ≤ 15 seconds for best recall and stability.

    • For longer media, pre-segment the videos into clips and process in batches.

Problem: Cannot pull container

  • Action:

    • Confirm NGC authentication via docker login nvcr.io.

    • Verify you have a valid NGC API key via echo $NGC_API_KEY.

    • Check network connectivity to the NGC registry.

Problem: GPU not found inside container

  • Action:

    • Ensure the NVIDIA Container Toolkit is installed.

    • Verify the --gpus all or --gpus device=0 flag is passed to docker run.

    • Verify that the nvidia-smi command works on the host system.

    • Validate that the CUDA drivers are compatible with the container.

Problem: CUDA out-of-memory (OOM) errors

  • Action:

    • Reduce the batch size for bulk operations.

    • Process videos sequentially instead of in parallel.

    • Restart the container to clear GPU memory.