Training Options Reference#
Tip
Looking for a step-by-step guide? Check out Create Customization Config.
For a complete reference of all training options with constraints and types:
CustomizationTrainingOption object
Resource configuration for model training.
Specifies the hardware and parallelization settings for training.
Properties
training_type * string
Allowed values:
dposftdistillationfinetuning_type * string
Allowed values:
loralora_mergedall_weightsnum_gpus * integer
The number of GPUs per node to use for the specified training
num_nodes integer
The number of nodes to use for the specified training
Default:
1tensor_parallel_size integer
Number of GPUs used to split individual layers for tensor model parallelism (intra-layer).
Default:
1data_parallel_size integer
Number of model replicas that process different data batches in parallel, with gradient synchronization across GPUs. Only available on HF checkpoint models. data_parallel_size must be equal num_gpus * num_nodes and is set to this value automatically if not provided.
pipeline_parallel_size integer
Number of GPUs used to split the model across layers for pipeline model parallelism (inter-layer). Only available on NeMo 2 checkpoint models. pipeline_parallel_size * tensor_parallel_size must equal num_gpus * num_nodes
Default:
1expert_model_parallel_size integer
Number of GPUs used to parallelize expert (MoE) components of the model. This controls distribution of expert computation across devices for models that use Mixture-of-Experts. If omitted (null), expert parallelism will not be enabled/assumed by default.Setting for models that do not use MoE can cause failures during training.
use_sequence_parallel boolean
If set, sequences are distributed over multiple GPUs
Default:
Falsemicro_batch_size * integer
The number of examples per data-parallel rank. More details at: https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/nemo_megatron/batching.html