Prerequisites#

Hardware#

Hardware

Optimized Mode (TRT-LLM)

H100 SXM/NVLink/PCIE FP16

A100 SXM/NVLink/PCIE FP16

L40s PCIe FP16

L4

H20

L20

Other Ampere+ SKUs

Functional

Note on modes#

“Optimized Mode” indicates that the full TensorRT‑LLM optimized pipeline is used for best performance. “Functional” indicates the NIM runs end-to-end but may use fallback paths without certain optimizations; expect lower throughput.

Software#

  • Docker

  • NVIDIA Container Toolkit

  • CUDA Drivers (11.8+)

NGC#

  1. Create an NGC account and generate an NGC API key with NGC catalog access.

  2. Authenticate Docker to NGC:

    docker login nvcr.io
    
    Username: $oauthtoken
    
    Password: <NGC API key>
    
  3. Set your API key as an environment variable:

    export NGC_API_KEY=<your NGC API key>
    

Supported Formats#

  • Text: UTF-8 strings

  • Video Codecs: H.264, HEVC, AV1, VP8, VP9, VC1, MPEG4, MPEG2, and MPEG1

  • Video Submission Methods:

    • query request type: Base64-encoded data or presigned URLs

    • bulk_video request type: Presigned URLs only (base64 rejected)

  • Video Length: The recommended length is 15 seconds; the recommended maximum length is 1-2 minutes. No strict maximum is enforced by the NIM.

Disk and Cache Requirements#

  • On first run, the container will download model assets. Allow 5-10 GB of free disk space for cache and temporary decode buffers depending on variant and workload.

  • If you enable a host cache, ensure the directory has read/write permissions for the container user.