nemo_microservices.types.beta.safe_synthesizer.training_hyperparams_param#

Module Contents#

Classes#

API#

class nemo_microservices.types.beta.safe_synthesizer.training_hyperparams_param.TrainingHyperparamsParam#

Bases: typing_extensions.TypedDict

batch_size: int#

None

The batch size per device for training

gradient_accumulation_steps: int#

None

Number of update steps to accumulate the gradients for, before performing a backward/update pass. This technique increases the effective batch size that will fit into GPU memory.

learning_rate: float#

None

The initial learning rate for AdamW optimizer.

lora_alpha_over_r: float#

None

The ratio of the LoRA scaling factor (alpha) to the LoRA rank.

Empirically, this parameter works well when set to 0.5, 1, or 2.

lora_r: int#

None

The rank of the LoRA update matrices, expressed in int.

Lower rank results in smaller update matrices with fewer trainable parameters.

lora_target_modules: nemo_microservices._types.SequenceNotStr[str]#

None

The list of transformer modules to apply LoRA to.

Possible modules: ‘q_proj’, ‘k_proj’, ‘v_proj’, ‘o_proj’, ‘gate_proj’, ‘up_proj’, ‘down_proj’

lr_scheduler: str#

None

The scheduler type to use.

See the HuggingFace documentation of SchedulerType for all possible values.

max_vram_fraction: float#

None

The fraction of the total VRAM to use for training.

Default is 0.9. Modify this to allow longer sequences to be used.

num_input_records_to_sample: Union[typing_extensions.Literal[auto], int]#

None

Number of records the model will see during training.

This parameter is a proxy for training time. For example, if its value is the same size as the input dataset, this is like training for a single epoch. If its value is larger, this is like training for multiple (possibly fractional) epochs. If its value is smaller, this is like training for a fraction of an epoch. Supports ‘auto’ where a reasonable value is chosen based on other config params and data.

peft_implementation: str#

None

The PEFT (Parameter-Efficient Fine-Tuning) implementation to use.

Options include ‘lora’ for Low-Rank Adaptation or QLoRA for Quantized LoRA. Each method has its own trade-offs in terms of performance and resource requirements.

pretrained_model: str#

None

Pretrained model to use for fine tuning. Uses default of TinyLlama.

quantization_bits: typing_extensions.Literal[4, 8]#

None

The number of bits to use for quantization if quantize_model is True.

Common values are 8 or 4 bits.

quantize_model: bool#

None

Whether to quantize the model during training.

This can reduce memory usage and potentially speed up training, but may also impact model accuracy.

rope_scaling_factor: Union[typing_extensions.Literal[auto], int]#

None

Scale the base LLM’s context length by this factor using RoPE scaling.

use_unsloth: Union[typing_extensions.Literal[auto], bool]#

None

Whether to use unsloth.

validation_ratio: float#

None

The fraction of the training data that will be used for validation.The range should be 0 to 1. If set to 0, no validation will be performed.If set larger than 0, validation loss will be computed and reported throughout training.

validation_steps: int#

None

The number of steps between validation checks for the HF Trainer arguments.

warmup_ratio: float#

None

Ratio of total training steps used for a linear warmup from 0 to the learning rate.

weight_decay: float#

None

The weight decay to apply (if not zero) to all layers except all bias and LayerNorm weights in the AdamW optimizer.