Platform Support

Enterprise RAG LLM Operator - (Latest Version)

NVIDIA NIM for LLMs uses the TensorRT-LLM backend for the performance-optimized approach. With this backend, the Operator supports the following models and hardware.

Model

GPU Requirements

Llama-2-13b-chat (default) 2 \(\times\) A100 80 GB SXM 2 \(\times\) H100 80 GB SXM
Mixtral-8x7B-v0.1 4 \(\times\) A100 80 GB SXM 4 \(\times\) H100 80 GB SXM

When NVIDIA NIM for LLMs is deployed to use the vLLM backend, the Operator supports the following models and hardware. The vLLM backend supports both SXM and PCIe.

Model

GPU Requirements

Llama-2-7b-chat 1 \(\times\) L40S 48 GB 1 \(\times\) A100 80 GB 1 \(\times\) H100 80 GB
Llama-2-13b-chat (default) 1 \(\times\) L40S 48 GB 1 \(\times\) A100 80 GB 1 \(\times\) H100 80 GB
Llama-2-70b-chat 8 \(\times\) L40S 48 GB 4 \(\times\) A100 80 GB 4 \(\times\) H100 80 GB
Mistral-7B-Instruct-v0.2 1 \(\times\) L40S 48 GB 1 \(\times\) A100 80 GB 1 \(\times\) H100 80 GB
Mixtral-8x7B-Instruct-v0.1 4 \(\times\) L40S 48 GB 2 \(\times\) A100 80 GB 2 \(\times\) H100 80 GB

The Operator supports the following GPUs for embedding. The following section covers the constraints for inference that depend on inference model, GPU model.

  • NVIDIA H100

  • NVIDIA A100 80 GB

  • NVIDIA L40S

Operating System

Kubernetes

VMware vSphere with Tanzu

Ubuntu 22.04 1.26—1.28 8.0 Update 2

Operating System

containerd

Ubuntu 22.04 1.6, 1.7
Previous About Helm Pipelines
Next Installing the NVIDIA Enterprise RAG LLM Operator
© Copyright © 2024, NVIDIA Corporation. Last updated on May 21, 2024.