Platform Support

Supported Inference Models and GPU Requirements

NVIDIA NIM for LLMs uses the TensorRT-LLM backend for the performance-optimized approach. With this backend, the Operator supports the following models and hardware.

Model	GPU Requirements
Llama-2-13b-chat (default)	2 \(\times\) A100 80 GB SXM	2 \(\times\) H100 80 GB SXM
Mixtral-8x7B-v0.1	4 \(\times\) A100 80 GB SXM	4 \(\times\) H100 80 GB SXM

When NVIDIA NIM for LLMs is deployed to use the vLLM backend, the Operator supports the following models and hardware. The vLLM backend supports both SXM and PCIe.

Model	GPU Requirements
Llama-2-7b-chat	1 \(\times\) L40S 48 GB	1 \(\times\) A100 80 GB	1 \(\times\) H100 80 GB
Llama-2-13b-chat (default)	1 \(\times\) L40S 48 GB	1 \(\times\) A100 80 GB	1 \(\times\) H100 80 GB
Llama-2-70b-chat	8 \(\times\) L40S 48 GB	4 \(\times\) A100 80 GB	4 \(\times\) H100 80 GB
Mistral-7B-Instruct-v0.2	1 \(\times\) L40S 48 GB	1 \(\times\) A100 80 GB	1 \(\times\) H100 80 GB
Mixtral-8x7B-Instruct-v0.1	4 \(\times\) L40S 48 GB	2 \(\times\) A100 80 GB	2 \(\times\) H100 80 GB

Supported GPUs for Embedding

The Operator supports the following GPUs for embedding. The following section covers the constraints for inference that depend on inference model, GPU model.

NVIDIA H100
NVIDIA A100 80 GB
NVIDIA L40S

NVIDIA GPU Driver

550.54.15
NVIDIA vGPU 17

Operating Systems and Kubernetes Platforms

Operating System	Kubernetes	VMware vSphere with Tanzu
Ubuntu 22.04	1.26—1.28	8.0 Update 2

Container Runtimes

Operating System	containerd
Ubuntu 22.04	1.6, 1.7

Previous About Helm Pipelines

Next Installing the NVIDIA Enterprise RAG LLM Operator