Prerequisites#
Hardware#
Hardware |
Optimized Mode (TRT-LLM) |
---|---|
H100 SXM/NVLink/PCIE FP16 |
✓ |
A100 SXM/NVLink/PCIE FP16 |
✓ |
L40s PCIe FP16 |
✓ |
L4 |
✓ |
H20 |
✓ |
L20 |
✓ |
Other Ampere+ SKUs |
Functional |
Note on modes#
“Optimized Mode” indicates that the full TensorRT‑LLM optimized pipeline is used for best performance. “Functional” indicates the NIM runs end-to-end but may use fallback paths without certain optimizations; expect lower throughput.
Software#
Docker
NVIDIA Container Toolkit
CUDA Drivers (11.8+)
NGC#
Create an NGC account and generate an NGC API key with NGC catalog access.
Authenticate Docker to NGC:
docker login nvcr.io Username: $oauthtoken Password: <NGC API key>
Set your API key as an environment variable:
export NGC_API_KEY=<your NGC API key>
Supported Formats#
Text: UTF-8 strings
Video Codecs: H.264, HEVC, AV1, VP8, VP9, VC1, MPEG4, MPEG2, and MPEG1
Video Submission Methods:
query request type: Base64-encoded data or presigned URLs
bulk_video request type: Presigned URLs only (base64 rejected)
Video Length: The recommended length is 15 seconds; the recommended maximum length is 1-2 minutes. No strict maximum is enforced by the NIM.
Disk and Cache Requirements#
On first run, the container will download model assets. Allow 5-10 GB of free disk space for cache and temporary decode buffers depending on variant and workload.
If you enable a host cache, ensure the directory has read/write permissions for the container user.