nemo_microservices.types.beta.safe_synthesizer.training_hyperparams#
Module Contents#
Classes#
API#
- class nemo_microservices.types.beta.safe_synthesizer.training_hyperparams.TrainingHyperparams(/, **data: Any)#
Bases:
nemo_microservices._models.BaseModel- batch_size: int | None#
None
The batch size per device for training
- gradient_accumulation_steps: int | None#
None
Number of update steps to accumulate the gradients for, before performing a backward/update pass. This technique increases the effective batch size that will fit into GPU memory.
- learning_rate: float | None#
None
The initial learning rate for
AdamWoptimizer.
- lora_alpha_over_r: float | None#
None
The ratio of the LoRA scaling factor (alpha) to the LoRA rank.
Empirically, this parameter works well when set to 0.5, 1, or 2.
- lora_r: int | None#
None
The rank of the LoRA update matrices, expressed in int.
Lower rank results in smaller update matrices with fewer trainable parameters.
- lora_target_modules: List[str] | None#
None
The list of transformer modules to apply LoRA to.
Possible modules: ‘q_proj’, ‘k_proj’, ‘v_proj’, ‘o_proj’, ‘gate_proj’, ‘up_proj’, ‘down_proj’
- lr_scheduler: str | None#
None
The scheduler type to use.
See the HuggingFace documentation of
SchedulerTypefor all possible values.
- num_input_records_to_sample: typing_extensions.Literal[auto] | int | None#
None
Number of records the model will see during training.
This parameter is a proxy for training time. For example, if its value is the same size as the input dataset, this is like training for a single epoch. If its value is larger, this is like training for multiple (possibly fractional) epochs. If its value is smaller, this is like training for a fraction of an epoch. Supports ‘auto’ where a reasonable value is chosen based on other config params and data.
- pretrained_model: str | None#
None
Pretrained model to use for fine tuning. Uses default of TinyLlama.
- rope_scaling_factor: typing_extensions.Literal[auto] | int | None#
None
Scale the base LLM’s context length by this factor using RoPE scaling.
- use_unsloth: typing_extensions.Literal[auto] | bool | None#
None
Whether to use unsloth.
- validation_ratio: float | None#
None
The fraction of the training data that will be used for validation.The range should be 0 to 1. If set to 0, no validation will be performed.If set larger than 0, validation loss will be computed and reported throughout training.
- validation_steps: int | None#
None
The number of steps between validation checks for the HF Trainer arguments.
- warmup_ratio: float | None#
None
Ratio of total training steps used for a linear warmup from 0 to the learning rate.
- weight_decay: float | None#
None
The weight decay to apply (if not zero) to all layers except all bias and LayerNorm weights in the AdamW optimizer.