Hardware Support for NVIDIA NIM on Google Kubernetes Engine (GKE)#
The following are the supported optimized profiles for specific hardware configurations for NVIDIA NIM on Google Kubernetes Engine (GKE).
| NIM | Version | Min #GPUs required for NIM | GPU | Supported compute on GCP | |||
|---|---|---|---|---|---|---|---|
| Compute name in config page | #GPU on instance | Precision | Profile | ||||
| meta/llama3.1-405b-instruct | 1.1.2 | 8 | H100 (80GB) | H100(80GB)-<region>-a3-highgpu-8g | 8 | FP8 (trt-llm) | Throughput |
| meta/llama3.1-8b-instruct | 1.1.2 | 1 | H100 (80GB) | H100(80GB)-<region>-a3-highgpu-8g | 8 | FP8 (trt-llm) | Throughput |
| 2 | H100 (80GB) | H100(80GB)-<region>-a3-highgpu-8g | 8 | FP8 (trt-llm) | Latency | ||
| 1 | A100 (80GB) | A100(80GB)-<region>-a2-ultragpu-1g | 1 | BF16 (trt-llm) | Throughput | ||
| 2 | A100 (80GB) | A100(80GB)-<region>-a2-ultragpu-2g | 2 | BF16 (trt-llm) | Latency | ||
| 2 | L4 | L4-<region>-g2-standard-24 | 2 | FP16 (vllm) | Non-optimized | ||
| meta/llama3.1-70b-instruct | 1.1.2 | 4 | H100 (80GB) | H100(80GB)-<region>-a3-highgpu-8g | 8 | FP8 (trt-llm) | Throughput |
| 8 | H100 (80GB) | H100(80GB)-<region>-a3-highgpu-8g | 8 | FP8 (trt-llm) | Latency | ||
| 4 | A100 (80GB) | A100(80GB)-<region>-a2-ultragpu-4g | 4 | BF16 (trt-llm) | Throughput | ||
| 8 | A100 (80GB) | A100(80GB)-<region>-a2-ultragpu-8g | 8 | BF16 (trt-llm) | Latency | ||
| 8 | L4 | L4-<region>-g2-standard-96 | 8 | FP16 (vllm) | Non-optimized | ||
| meta/llama3-70b-instruct | 1.0.3 | 4 | H100 (80GB) | H100(80GB)-<region>-a3-highgpu-8g | 8 | FP8 (trt-llm) | Throughput |
| 8 | H100 (80GB) | H100(80GB)-<region>-a3-highgpu-8g | 8 | FP8 (trt-llm) | Latency | ||
| 4 | A100 (80GB) | A100(80GB)-<region>-a2-ultragpu-4g | 4 | FP16 | Throughput | ||
| 8 | L4 | L4-<region>-g2-standard-96 | 8 | FP16 (vllm) | Non-optimized | ||
| meta/llama3-8b-instruct | 1.0.3 | 1 | H100 (80GB) | H100(80GB)-<region>-a3-highgpu-8g | 8 | FP16 | Throughput |
| 2 | H100 (80GB) | H100(80GB)-<region>-a3-highgpu-8g | 8 | FP16 | Latency | ||
| 1 | A100 (80GB) | A100(80GB)-<region>-a2-ultragpu-1g | 1 | FP16 | Throughput | ||
| 2 | A100 (80GB) | A100(80GB)-<region>-a2-ultragpu-2g | 2 | FP16 | Latency | ||
| 2 | L4 | L4-<region>-g2-standard-24 | 2 | FP16 (vllm) | Non-optimized | ||
| mistralai/mistral-7b-instruct-v.03 | 1.0.3 | 1 | H100 (80GB) | H100(80GB)-<region>-a3-highgpu-8g | 8 | FP8 (trt-llm) | Throughput |
| 2 | H100 (80GB) | H100(80GB)-<region>-a3-highgpu-8g | 8 | FP8 (trt-llm) | Latency | ||
| 1 | A100 (80GB) | A100(80GB)-<region>-a2-ultragpu-1g | 1 | FP16 (trt-llm) | Throughput | ||
| 2 | A100 (80GB) | A100(80GB)-<region>-a2-ultragpu-2g | 2 | FP16 (trt-llm) | Latency | ||
| 4 | L4 | L4-<region>-g2-standard-48 | 4 | FP16 (vllm) | Non-optimized | ||
| mistralai/mixtral-8x7b-instruct-v0.1 | 1.0.0 | 2 | H100 (80GB) | H100(80GB)-<region>-a3-highgpu-8 | 8 | FP8 (trt-llm) | Throughput |
| 4 | H100 (80GB) | H100(80GB)-<region>-a3-highgpu-8g | 8 | FP8 (trt-llm) | Latency | ||
| 2 | A100 (80GB) | A100(80GB)-<region>-a2-ultragpu-2 g | 2 | FP16 (trt-llm) | Throughput | ||
| 4 | A100 (80GB) | A100(80GB)-<region>-a2-ultragpu-4g | 4 | FP16 (trt-llm) | Latency | ||
| nvidia/nv-rerankqa-mistral-4b-v3 | 1.0.2 | 1 | H100 (80GB) | H100(80GB)-<region>-a3-highgpu-8g | 8 | FP16 | |
| 1 | A100 (80GB) | A100(80GB)-<region>-a2-ultragpu-1g | 1 | FP16 | |||
| 2 | L4 | L4-<region>-g2-standard-24 | 2 | FP16 | |
||
| nvidia/nv-embedqa-e5-v5 | 1.0.1 | 1 | H100 (80GB) | H100(80GB)-<region>-a3-highgpu-8g | 8 | FP16 | |
| 1 | A100 (80GB) | A100(80GB)-<region>-a2-ultragpu-1g | 1 | FP16 | |||
| 2 | L4 | L4-<region>-g2-standard-24 | 2 | FP16 | |
||
| nvidia/nv-embedqa-mistral-7b-v2 | 1.0.1 | 1 | H100 (80GB) | H100(80GB)-<region>-a3-highgpu-8g | 8 | FP8 | |
| 1 | A100 (80GB) | A100(80GB)-<region>-a2-ultragpu-1g | 1 | FP16 | |||
| 2 | L4 | L4-<region>-g2-standard-24 | 2 | FP16 | |
||