bridge.training.mixed_precision
#
Module Contents#
Classes#
Mixed precision configuration for models. |
Functions#
Update a config object with precision settings from mixed_precision_config. |
|
Decorator that registers a mixed-precision recipe factory by its function name. |
|
Create a MixedPrecisionConfig for mixed precision training using BF16. |
|
Create a MixedPrecisionConfig for mixed precision training using FP16. |
|
Create a MixedPrecisionConfig for mixed precision training using BF16 with FP8. |
|
Create a MixedPrecisionConfig for mixed precision training using FP16 with FP8. |
|
Create a MixedPrecisionConfig for mixed precision training using BF16 with MXFP8. |
|
Create a MixedPrecisionConfig for mixed precision training using FP16 with MXFP8. |
|
Create a MixedPrecisionConfig for mixed precision training using BF16 with FP8 per-tensor current scaling. |
|
Create a MixedPrecisionConfig for mixed precision training using BF16 with FP8 per-tensor current scaling. |
|
Create a MixedPrecisionConfig for mixed precision training using BF16 with FP8 per-tensor current scaling. |
|
Create a MixedPrecisionConfig for mixed precision training using FP16 with FP8 per-tensor current scaling. |
|
Create a MixedPrecisionConfig for mixed precision training using BF16 with FP8 NV Subchannel scaling. This recipe uses 128x128 blockwise quantization for weight and 1x128 blockwise quantization for activation. |
|
Create a MixedPrecisionConfig for mixed precision training using FP16 with FP8 NV Subchannel scaling. This recipe uses 128x128 blockwise quantization for weight and 1x128 blockwise quantization for activation. |
|
Return a :class: |
Data#
API#
- class bridge.training.mixed_precision.MixedPrecisionConfig#
Mixed precision configuration for models.
Handles conversion of model parameters and inputs/outputs between different precisions, and manages mixed precision training settings.
- fp32: bool#
False
- fp16: bool#
False
- bf16: bool#
False
- params_dtype: Optional[torch.dtype]#
None
- pipeline_dtype: Optional[torch.dtype]#
None
- autocast_dtype: Optional[torch.dtype]#
None
- autocast_enabled: bool#
False
- grad_reduce_in_fp32: bool#
True
- fp8: Optional[str]#
None
- fp8_recipe: str#
‘delayed’
- first_last_layers_bf16: bool#
False
- fp8_margin: int#
0
- fp8_amax_history_len: int#
1
- fp8_amax_compute_algo: str#
‘most_recent’
- fp8_wgrad: bool#
True
- fp8_dot_product_attention: bool#
False
- fp8_multi_head_attention: bool#
False
- fp8_param: Optional[bool]#
None
- fp8_param_gather: bool#
False
- loss_scale: Optional[float]#
None
- initial_loss_scale: Optional[float]#
4294967296
- min_loss_scale: float#
1.0
- loss_scale_window: float#
1000
- hysteresis: int#
2
- num_layers_at_start_in_bf16: int#
0
- num_layers_at_end_in_bf16: int#
0
- __setattr__(name: str, value) None #
- __post_init__()#
- setup(
- model_config: megatron.bridge.models.GPTModelProvider | megatron.bridge.models.T5ModelProvider,
- optimizer_config: Optional[megatron.core.optimizer.OptimizerConfig] = None,
- ddp_config: Optional[megatron.core.distributed.DistributedDataParallelConfig] = None,
Apply mixed precision configs to model, optimizer, and DDP configs.
- Parameters:
model_config – Model configuration to update with dtype settings
optimizer_config – Optional optimizer configuration to update
ddp_config – Optional DDP configuration to update
- bridge.training.mixed_precision.update_config_with_precision_overrides(
- mixed_precision_config: bridge.training.mixed_precision.MixedPrecisionConfig,
- config,
Update a config object with precision settings from mixed_precision_config.
- Parameters:
mixed_precision_config – Source of precision settings
config – Config object to update
- Returns:
Updated config object
- bridge.training.mixed_precision.MIXED_PRECISION_RECIPES: dict[str, Callable[[], bridge.training.mixed_precision.MixedPrecisionConfig]]#
None
- bridge.training.mixed_precision.register(
- func: Callable[[], bridge.training.mixed_precision.MixedPrecisionConfig],
Decorator that registers a mixed-precision recipe factory by its function name.
- bridge.training.mixed_precision.bf16_mixed() bridge.training.mixed_precision.MixedPrecisionConfig #
Create a MixedPrecisionConfig for mixed precision training using BF16.
- Returns:
Configuration for BF16 mixed precision training
- Return type:
- bridge.training.mixed_precision.fp16_mixed() bridge.training.mixed_precision.MixedPrecisionConfig #
Create a MixedPrecisionConfig for mixed precision training using FP16.
- Returns:
Configuration for FP16 mixed precision training
- Return type:
- bridge.training.mixed_precision.bf16_with_fp8_mixed() bridge.training.mixed_precision.MixedPrecisionConfig #
Create a MixedPrecisionConfig for mixed precision training using BF16 with FP8.
Note: FP8 recipes are experimental and have not been tested for training convergence.
- Returns:
Configuration for BF16 with FP8 mixed precision training
- Return type:
- bridge.training.mixed_precision.fp16_with_fp8_mixed() bridge.training.mixed_precision.MixedPrecisionConfig #
Create a MixedPrecisionConfig for mixed precision training using FP16 with FP8.
Note: FP8 recipes are experimental and have not been tested for training convergence.
- Returns:
Configuration for FP16 with FP8 mixed precision training
- Return type:
- bridge.training.mixed_precision.bf16_with_mxfp8_mixed() bridge.training.mixed_precision.MixedPrecisionConfig #
Create a MixedPrecisionConfig for mixed precision training using BF16 with MXFP8.
- Returns:
Configuration for BF16 with MXFP8 mixed precision training
- Return type:
- bridge.training.mixed_precision.fp16_with_mxfp8_mixed() bridge.training.mixed_precision.MixedPrecisionConfig #
Create a MixedPrecisionConfig for mixed precision training using FP16 with MXFP8.
- Returns:
Configuration for FP16 with MXFP8 mixed precision training
- Return type:
- bridge.training.mixed_precision.bf16_with_fp8_current_scaling_mixed() bridge.training.mixed_precision.MixedPrecisionConfig #
Create a MixedPrecisionConfig for mixed precision training using BF16 with FP8 per-tensor current scaling.
Note: The baseline current scaling recipe uses BF16 in the first and last Transformer layers. The user can choose to disable the BF16 layers or apply BF16 to more Transformer layers.
- Returns:
Configuration for BF16 with FP8 per-tensor current scaling mixed precision training
- Return type:
- bridge.training.mixed_precision.nemotron_h_bf16_with_fp8_current_scaling_mixed() bridge.training.mixed_precision.MixedPrecisionConfig #
Create a MixedPrecisionConfig for mixed precision training using BF16 with FP8 per-tensor current scaling.
Note: The baseline current scaling recipe uses BF16 in the first and last Transformer layers. The user can choose to disable the BF16 layers or apply BF16 to more Transformer layers.
- Returns:
Configuration for BF16 with FP8 per-tensor current scaling mixed precision training
- Return type:
- bridge.training.mixed_precision.nanov2_bf16_with_fp8_current_scaling_mixed() bridge.training.mixed_precision.MixedPrecisionConfig #
Create a MixedPrecisionConfig for mixed precision training using BF16 with FP8 per-tensor current scaling.
Note: The baseline current scaling recipe uses BF16 in the first and last Transformer layers. The user can choose to disable the BF16 layers or apply BF16 to more Transformer layers.
- Returns:
Configuration for BF16 with FP8 per-tensor current scaling mixed precision training
- Return type:
- bridge.training.mixed_precision.fp16_with_fp8_current_scaling_mixed() bridge.training.mixed_precision.MixedPrecisionConfig #
Create a MixedPrecisionConfig for mixed precision training using FP16 with FP8 per-tensor current scaling.
Note: The baseline current scaling recipe uses FP16 in the first and last Transformer layers. The user can choose to disable the FP16 layers or apply FP16 to more Transformer layers.
- Returns:
Configuration for FP16 with FP8 per-tensor current scaling mixed precision training
- Return type:
- bridge.training.mixed_precision.bf16_with_fp8_subchannel_scaling_mixed() bridge.training.mixed_precision.MixedPrecisionConfig #
Create a MixedPrecisionConfig for mixed precision training using BF16 with FP8 NV Subchannel scaling. This recipe uses 128x128 blockwise quantization for weight and 1x128 blockwise quantization for activation.
- Returns:
Configuration for BF16 with FP8 subchannel scaling mixed precision training
- Return type:
- bridge.training.mixed_precision.fp16_with_fp8_subchannel_scaling_mixed() bridge.training.mixed_precision.MixedPrecisionConfig #
Create a MixedPrecisionConfig for mixed precision training using FP16 with FP8 NV Subchannel scaling. This recipe uses 128x128 blockwise quantization for weight and 1x128 blockwise quantization for activation.
- Returns:
Configuration for FP16 with FP8 subchannel scaling mixed precision training
- Return type:
- bridge.training.mixed_precision.get_mixed_precision_config(
- name: str,
Return a :class:
MixedPrecisionConfig
for name.- Parameters:
name – Key of the recipe in :pydata:
MIXED_PRECISION_RECIPES
.- Raises:
ValueError – If name is not a known recipe.