nemo_microservices.types.beta.safe_synthesizer.training_hyperparams_param#

Module Contents#

Classes#

API#

class nemo_microservices.types.beta.safe_synthesizer.training_hyperparams_param.TrainingHyperparamsParam#

Bases: typing_extensions.TypedDict

batch_size: int#

None

The batch size per device for training

gradient_accumulation_steps: int#

None

Number of update steps to accumulate the gradients for, before performing a backward/update pass. This technique increases the effective batch size that will fit into GPU memory.

learning_rate: float#

None

The initial learning rate for AdamW optimizer.

lora_alpha_over_r: float#

None

The ratio of the LoRA scaling factor (alpha) to the LoRA rank.

Empirically, this parameter works well when set to 0.5, 1, or 2.

lora_r: int#

None

The rank of the LoRA update matrices, expressed in int.

Lower rank results in smaller update matrices with fewer trainable parameters.

lora_target_modules: nemo_microservices._types.SequenceNotStr[str]#

None

The list of transformer modules to apply LoRA to.

Possible modules: ‘q_proj’, ‘k_proj’, ‘v_proj’, ‘o_proj’, ‘gate_proj’, ‘up_proj’, ‘down_proj’

lr_scheduler: str#

None

The scheduler type to use.

See the HuggingFace documentation of SchedulerType for all possible values.

num_input_records_to_sample: typing_extensions.Literal[auto] | int#

None

Number of records the model will see during training.

This parameter is a proxy for training time. For example, if its value is the same size as the input dataset, this is like training for a single epoch. If its value is larger, this is like training for multiple (possibly fractional) epochs. If its value is smaller, this is like training for a fraction of an epoch. Supports ‘auto’ where a reasonable value is chosen based on other config params and data.

pretrained_model: str#

None

Pretrained model to use for fine tuning. Uses default of TinyLlama.

rope_scaling_factor: typing_extensions.Literal[auto] | int#

None

Scale the base LLM’s context length by this factor using RoPE scaling.

use_unsloth: typing_extensions.Literal[auto] | bool#

None

Whether to use unsloth.

validation_ratio: float#

None

The fraction of the training data that will be used for validation.The range should be 0 to 1. If set to 0, no validation will be performed.If set larger than 0, validation loss will be computed and reported throughout training.

validation_steps: int#

None

The number of steps between validation checks for the HF Trainer arguments.

warmup_ratio: float#

None

Ratio of total training steps used for a linear warmup from 0 to the learning rate.

weight_decay: float#

None

The weight decay to apply (if not zero) to all layers except all bias and LayerNorm weights in the AdamW optimizer.