Prerequisites#

Hardware#

Hardware	Optimized Mode (TRT-LLM)
H100 SXM/NVLink/PCIE FP16	✓
A100 SXM/NVLink/PCIE FP16	✓
L40s PCIe FP16	✓
L4	✓
H20	✓
L20	✓
Other Ampere+ SKUs	Functional

Note on modes#

“Optimized Mode” indicates that the full TensorRT‑LLM optimized pipeline is used for best performance. “Functional” indicates the NIM runs end-to-end but may use fallback paths without certain optimizations; expect lower throughput.

Software#

Docker
NVIDIA Container Toolkit
CUDA Drivers (11.8+)

NGC#

Create an NGC account and generate an NGC API key with NGC catalog access.

Authenticate Docker to NGC:

docker login nvcr.io

Username: $oauthtoken

Password: <NGC API key>

Set your API key as an environment variable:
```
export NGC_API_KEY=<your NGC API key>
```

Supported Formats#

Text: UTF-8 strings
Video Codecs: H.264, HEVC, AV1, VP8, VP9, VC1, MPEG4, MPEG2, and MPEG1
Video Submission Methods:
- query request type: Base64-encoded data or presigned URLs
- bulk_video request type: Presigned URLs only (base64 rejected)
Video Length: The recommended length is 15 seconds; the recommended maximum length is 1-2 minutes. No strict maximum is enforced by the NIM.

Disk and Cache Requirements#

On first run, the container will download model assets. Allow 5-10 GB of free disk space for cache and temporary decode buffers depending on variant and workload.
If you enable a host cache, ensure the directory has read/write permissions for the container user.