nemo_microservices.types.customization.sft_parameters#
Module Contents#
Classes#
API#
- class nemo_microservices.types.customization.sft_parameters.SftParameters(/, **data: typing.Any)#
Bases:
nemo_microservices._models.BaseModel- attention_dropout: Optional[float]#
None
Dropout probability applied to attention weights in the self-attention mechanism.
Randomly zeros a fraction of attention scores during training to improve generalization. Typical values range from 0.0 (no dropout) to 0.1. Set to None to use model defaults. Higher values can help prevent the model from over-relying on specific token relationships.
None
Dropout probability applied to the hidden states in transformer layers.
Randomly zeros a fraction of hidden state activations during training to prevent overfitting. Typical values range from 0.0 (no dropout) to 0.1. Set to None to use model defaults. Higher values increase regularization but may slow convergence.