Model Configurations Matrix#

This page lists supported models and their recommended GPU configurations, including L40, A100, and H100.

Llama#

The following table lists recommended GPU configurations for Llama models.

Model Name

Fine-tuning Type

GPUs

Nodes

Tensor Parallel

Pipeline Parallel

Max Seq Len

Micro Batch Size

llama-3.2-1b@v1.0.0+L40

lora

1

1

1

-

4096

1

llama-3.2-1b@v1.0.0+L40

all_weights

1

1

1

-

4096

1

llama-3.2-1b-instruct@v1.0.0+L40

lora

1

1

1

-

4096

1

llama-3.2-1b-instruct@v1.0.0+L40

all_weights

1

1

1

-

4096

1

llama-3.2-1b-embedding@0.0.1+L40

all_weights

1

1

1

-

2048

4

llama-3.2-3b-instruct@v1.0.0+L40

lora

1

1

1

-

4096

1

llama-3.1-8b-instruct@v1.0.0+L40

lora

2

1

2

-

4096

1

llama-3.1-8b-instruct@v1.0.0+L40

all_weights

4

1

4

-

4096

1

llama3-70b-instruct@v1.0.0+L40

lora

16

4

4

2

1400

1

llama-3.1-70b-instruct@v1.0.0+L40

lora

16

4

4

2

1400

1

llama-3.3-70b-instruct@v1.0.0+L40

lora

16

4

4

2

1400

1

Llama Nemotron#

The following table lists recommended GPU configurations for Nemotron models.

Model Name

Fine-tuning Type

GPUs

Nodes

Tensor Parallel

Pipeline Parallel

Max Seq Len

Micro Batch Size

nemotron-nano-llama-3.1-8b@v1.0.0+L40

lora

1

1

1

-

4096

1

nemotron-nano-llama-3.1-8b@v1.0.0+L40

all_weights

1

1

1

-

4096

1

nemotron-super-llama-3.3-49b@v1.0.0+L40

lora

4

4

4

2

4096

1

Phi#

The following table lists recommended GPU configurations for Phi models.

Model Name

Fine-tuning Type

GPUs

Nodes

Tensor Parallel

Pipeline Parallel

Max Seq Len

Micro Batch Size

phi-4@v1.0.0+L40

lora

1

1

1

-

4096

1

Note: For 70B models, the tested configuration was 4 nodes × 4 GPUs (TP=4, PP=2) with a max sequence length of 1400. Using a max sequence length of 4096 causes out-of-memory (OOM) errors even with 4 nodes × 4 GPUs. For 4096 sequence length, it is recommended to use 5 nodes × 4 GPUs (TP=4, PP=5). Adjust resources as needed for your workload.

For the latest and most detailed configuration options, refer to the values.yaml in the Helm chart.