Hyperparameters#

class nemo_microservices.types.customization.Hyperparameters(*args: Any, **kwargs: Any)

Bases: BaseModel

finetuning_type: Literal['p_tuning', 'lora', 'all_weights']

The finetuning type for the customization job.

batch_size: int | None = None

Batch size is the number of training samples used to train a single forwardand backward pass.

distillation: DistillationParameters | None = None

Specific parameters for knowledge distillation

epochs: int | None = None

Epochs is the number of complete passes through the training dataset.

learning_rate: float | None = None

How much to adjust the model parameters in response to the loss gradient

log_every_n_steps: int | None = None

Control logging frequency for metrics tracking.

It may slow down training to log on every single batch. By default, logs every 10 training steps.

lora: LoraParameters | None = None

Specific parameters for LoRA.

p_tuning: PTuningParameters | None = None

Specific parameters for p-tuning.

sequence_packing_enabled: bool | None = None

Sequence packing can improve speed of training by letting the training work on multiple rows at the same time. Experimental and not supported by all models. If a model is not supported, a warning will be returned in the response body and training will proceed with sequence packing disabled. Not recommended for produciton use. This flag may be removed in the future. See https://docs.nvidia.com/nemo-framework/user-guide/latest/sft_peft/packed_sequence.html for more details.

sft: SftParameters | None = None

Specific parameters for SFT.

training_type: Literal['sft', 'distillation'] | None = None

The training type for the customization job.

val_check_interval: float | None = None

Control how often to check the validation set with after a fixed number of training batches or pass a float in the range [0.1, 1.0] to check after a fraction of the training epoch. Note that Early Stopping monitors the validation loss and stops the training when no improvement is observed after 10 epochs with a minimum delta of 0.001. If val_check_interval is greater than the number of training batches, validation will run every epoch.

weight_decay: float | None = None

An additional penalty term added to the gradient descent to keep weights low and mitigate overfitting.