nemo_microservices.types.beta.safe_synthesizer.training_hyperparams#

Module Contents#

Classes#

API#

class nemo_microservices.types.beta.safe_synthesizer.training_hyperparams.TrainingHyperparams(/, **data: typing.Any)#

Bases: nemo_microservices._models.BaseModel

batch_size: Optional[int]#

None

The batch size per device for training

gradient_accumulation_steps: Optional[int]#

None

Number of update steps to accumulate the gradients for, before performing a backward/update pass. This technique increases the effective batch size that will fit into GPU memory.

learning_rate: Optional[float]#

None

The initial learning rate for AdamW optimizer.

lora_alpha_over_r: Optional[float]#

None

The ratio of the LoRA scaling factor (alpha) to the LoRA rank.

Empirically, this parameter works well when set to 0.5, 1, or 2.

lora_r: Optional[int]#

None

The rank of the LoRA update matrices, expressed in int.

Lower rank results in smaller update matrices with fewer trainable parameters.

lora_target_modules: Optional[List[str]]#

None

The list of transformer modules to apply LoRA to.

Possible modules: ‘q_proj’, ‘k_proj’, ‘v_proj’, ‘o_proj’, ‘gate_proj’, ‘up_proj’, ‘down_proj’

lr_scheduler: Optional[str]#

None

The scheduler type to use.

See the HuggingFace documentation of SchedulerType for all possible values.

max_vram_fraction: Optional[float]#

None

The fraction of the total VRAM to use for training.

Default is 0.9. Modify this to allow longer sequences to be used.

num_input_records_to_sample: Union[typing_extensions.Literal[auto], int, None]#

None

Number of records the model will see during training.

This parameter is a proxy for training time. For example, if its value is the same size as the input dataset, this is like training for a single epoch. If its value is larger, this is like training for multiple (possibly fractional) epochs. If its value is smaller, this is like training for a fraction of an epoch. Supports ‘auto’ where a reasonable value is chosen based on other config params and data.

peft_implementation: Optional[str]#

None

The PEFT (Parameter-Efficient Fine-Tuning) implementation to use.

Options include ‘lora’ for Low-Rank Adaptation or QLoRA for Quantized LoRA. Each method has its own trade-offs in terms of performance and resource requirements.

pretrained_model: Optional[str]#

None

Pretrained model to use for fine tuning. Uses default of TinyLlama.

quantization_bits: Optional[typing_extensions.Literal[4, 8]]#

None

The number of bits to use for quantization if quantize_model is True.

Common values are 8 or 4 bits.

quantize_model: Optional[bool]#

None

Whether to quantize the model during training.

This can reduce memory usage and potentially speed up training, but may also impact model accuracy.

rope_scaling_factor: Union[typing_extensions.Literal[auto], int, None]#

None

Scale the base LLM’s context length by this factor using RoPE scaling.

use_unsloth: Union[typing_extensions.Literal[auto], bool, None]#

None

Whether to use unsloth.

validation_ratio: Optional[float]#

None

The fraction of the training data that will be used for validation.The range should be 0 to 1. If set to 0, no validation will be performed.If set larger than 0, validation loss will be computed and reported throughout training.

validation_steps: Optional[int]#

None

The number of steps between validation checks for the HF Trainer arguments.

warmup_ratio: Optional[float]#

None

Ratio of total training steps used for a linear warmup from 0 to the learning rate.

weight_decay: Optional[float]#

None

The weight decay to apply (if not zero) to all layers except all bias and LayerNorm weights in the AdamW optimizer.