Model Configurations Matrix#

This page lists supported models and their recommended GPU configurations, including L40, A100, and H100.

Llama#

The following table lists recommended GPU configurations for Llama models.

Model Name	Fine-tuning Type	GPUs	Nodes	Tensor Parallel	Pipeline Parallel	Max Seq Len	Micro Batch Size
llama-3.2-1b@v1.0.0+L40	lora	1	1	1	-	4096	1
llama-3.2-1b@v1.0.0+L40	all_weights	1	1	1	-	4096	1
llama-3.2-1b-instruct@v1.0.0+L40	lora	1	1	1	-	4096	1
llama-3.2-1b-instruct@v1.0.0+L40	all_weights	1	1	1	-	4096	1
llama-3.2-1b-embedding@0.0.1+L40	all_weights	1	1	1	-	2048	4
llama-3.2-3b-instruct@v1.0.0+L40	lora	1	1	1	-	4096	1
llama-3.1-8b-instruct@v1.0.0+L40	lora	2	1	2	-	4096	1
llama-3.1-8b-instruct@v1.0.0+L40	all_weights	4	1	4	-	4096	1
llama3-70b-instruct@v1.0.0+L40	lora	16	4	4	2	1400	1
llama-3.1-70b-instruct@v1.0.0+L40	lora	16	4	4	2	1400	1
llama-3.3-70b-instruct@v1.0.0+L40	lora	16	4	4	2	1400	1

Llama Nemotron#

The following table lists recommended GPU configurations for Nemotron models.

Model Name	Fine-tuning Type	GPUs	Nodes	Tensor Parallel	Pipeline Parallel	Max Seq Len	Micro Batch Size
nemotron-nano-llama-3.1-8b@v1.0.0+L40	lora	1	1	1	-	4096	1
nemotron-nano-llama-3.1-8b@v1.0.0+L40	all_weights	1	1	1	-	4096	1
nemotron-super-llama-3.3-49b@v1.0.0+L40	lora	4	4	4	2	4096	1

Phi#

The following table lists recommended GPU configurations for Phi models.

Model Name	Fine-tuning Type	GPUs	Nodes	Tensor Parallel	Pipeline Parallel	Max Seq Len	Micro Batch Size
phi-4@v1.0.0+L40	lora	1	1	1	-	4096	1

Note: For 70B models, the tested configuration was 4 nodes × 4 GPUs (TP=4, PP=2) with a max sequence length of 1400. Using a max sequence length of 4096 causes out-of-memory (OOM) errors even with 4 nodes × 4 GPUs. For 4096 sequence length, it is recommended to use 5 nodes × 4 GPUs (TP=4, PP=5). Adjust resources as needed for your workload.

For the latest and most detailed configuration options, refer to the values.yaml in the Helm chart.