nemo_microservices.types.customization_training_option#
Module Contents#
Classes#
API#
- class nemo_microservices.types.customization_training_option.CustomizationTrainingOption(/, **data: Any)#
Bases:
nemo_microservices._models.BaseModel- data_parallel_size: int | None#
None
Number of model replicas that process different data batches in parallel, with gradient synchronization across GPUs. Only available on HF checkpoint models. data_parallel_size must be equal num_gpus ** num_nodes and is set to this value automatically if not provided.
- expert_model_parallel_size: int | None#
None
Number of GPUs used to parallelize expert (MoE) components of the model.
This controls distribution of expert computation across devices for models that use Mixture-of-Experts. If omitted (null), expert parallelism will not be enabled/assumed by default.Setting for models that do not use MoE can cause failures during training.
- finetuning_type: nemo_microservices.types.shared.finetuning_type.FinetuningType#
None
- micro_batch_size: int#
None
The number of examples per data-parallel rank.
More details at: https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/nemo_megatron/batching.html
- num_gpus: int#
None
The number of GPUs per node to use for the specified training
- num_nodes: int | None#
None
The number of nodes to use for the specified training
- pipeline_parallel_size: int | None#
None
Number of GPUs used to split the model across layers for pipeline model parallelism (inter-layer). Only available on NeMo 2 checkpoint models. pipeline_parallel_size _ tensor_parallel_size must equal num_gpus _ num_nodes
- tensor_parallel_size: int | None#
None
Number of GPUs used to split individual layers for tensor model parallelism (intra-layer).
- training_type: nemo_microservices.types.training_type.TrainingType#
None
- use_sequence_parallel: bool | None#
None
If set, sequences are distributed over multiple GPUs