Hyperparameters#
- class nemo_microservices.types.customization.Hyperparameters(*args: Any, **kwargs: Any)
Bases:
BaseModel
- finetuning_type: Literal['p_tuning', 'lora', 'all_weights']
The finetuning type for the customization job.
- batch_size: int | None = None
Batch size is the number of training samples used to train a single forwardand backward pass.
- distillation: DistillationParameters | None = None
Specific parameters for knowledge distillation
- epochs: int | None = None
Epochs is the number of complete passes through the training dataset.
- learning_rate: float | None = None
How much to adjust the model parameters in response to the loss gradient
- log_every_n_steps: int | None = None
Control logging frequency for metrics tracking.
It may slow down training to log on every single batch. By default, logs every 10 training steps.
- lora: LoraParameters | None = None
Specific parameters for LoRA.
- p_tuning: PTuningParameters | None = None
Specific parameters for p-tuning.
- sequence_packing_enabled: bool | None = None
Sequence packing can improve speed of training by letting the training work on multiple rows at the same time. Experimental and not supported by all models. If a model is not supported, a warning will be returned in the response body and training will proceed with sequence packing disabled. Not recommended for produciton use. This flag may be removed in the future. See https://docs.nvidia.com/nemo-framework/user-guide/latest/sft_peft/packed_sequence.html for more details.
- sft: SftParameters | None = None
Specific parameters for SFT.
- training_type: Literal['sft', 'distillation'] | None = None
The training type for the customization job.
- val_check_interval: float | None = None
Control how often to check the validation set with after a fixed number of training batches or pass a float in the range [0.1, 1.0] to check after a fraction of the training epoch. Note that Early Stopping monitors the validation loss and stops the training when no improvement is observed after 10 epochs with a minimum delta of 0.001. If val_check_interval is greater than the number of training batches, validation will run every epoch.
- weight_decay: float | None = None
An additional penalty term added to the gradient descent to keep weights low and mitigate overfitting.