Hyperparameters#

class nemo_microservices.types.customization.Hyperparameters(*args: Any, **kwargs: Any)

Bases: BaseModel

finetuning_type: Literal['p_tuning', 'lora', 'all_weights']: The finetuning type for the customization job.

batch_size: int | None = None: Batch size is the number of training samples used to train a single forwardand backward pass.

distillation: DistillationParameters | None = None: Specific parameters for knowledge distillation

epochs: int | None = None: Epochs is the number of complete passes through the training dataset.

learning_rate: float | None = None: How much to adjust the model parameters in response to the loss gradient

log_every_n_steps: int | None = None

Control logging frequency for metrics tracking.

It may slow down training to log on every single batch. By default, logs every 10 training steps.

p_tuning: PTuningParameters | None = None: Specific parameters for p-tuning.

sequence_packing_enabled: bool | None = None: Sequence packing can improve speed of training by letting the training work on multiple rows at the same time. Experimental and not supported by all models. If a model is not supported, a warning will be returned in the response body and training will proceed with sequence packing disabled. Not recommended for produciton use. This flag may be removed in the future. See https://docs.nvidia.com/nemo-framework/user-guide/latest/sft_peft/packed_sequence.html for more details.

training_type: Literal['sft', 'distillation'] | None = None: The training type for the customization job.

val_check_interval: float | None = None: Control how often to check the validation set with after a fixed number of training batches or pass a float in the range [0.1, 1.0] to check after a fraction of the training epoch. Note that Early Stopping monitors the validation loss and stops the training when no improvement is observed after 10 epochs with a minimum delta of 0.001. If val_check_interval is greater than the number of training batches, validation will run every epoch.

weight_decay: float | None = None: An additional penalty term added to the gradient descent to keep weights low and mitigate overfitting.