bridge.training.mixed_precision#

Module Contents#

Classes#

MixedPrecisionConfig

Mixed precision configuration for models.

Functions#

update_config_with_precision_overrides

Update a config object with precision settings from mixed_precision_config.

register

Decorator that registers a mixed-precision recipe factory by its function name.

bf16_mixed

Create a MixedPrecisionConfig for mixed precision training using BF16.

fp16_mixed

Create a MixedPrecisionConfig for mixed precision training using FP16.

bf16_with_fp8_mixed

Create a MixedPrecisionConfig for mixed precision training using BF16 with FP8.

fp16_with_fp8_mixed

Create a MixedPrecisionConfig for mixed precision training using FP16 with FP8.

bf16_with_mxfp8_mixed

Create a MixedPrecisionConfig for mixed precision training using BF16 with MXFP8.

fp16_with_mxfp8_mixed

Create a MixedPrecisionConfig for mixed precision training using FP16 with MXFP8.

bf16_with_fp8_current_scaling_mixed

Create a MixedPrecisionConfig for mixed precision training using BF16 with FP8 per-tensor current scaling.

nemotron_h_bf16_with_fp8_current_scaling_mixed

Create a MixedPrecisionConfig for mixed precision training using BF16 with FP8 per-tensor current scaling.

nanov2_bf16_with_fp8_current_scaling_mixed

Create a MixedPrecisionConfig for mixed precision training using BF16 with FP8 per-tensor current scaling.

fp16_with_fp8_current_scaling_mixed

Create a MixedPrecisionConfig for mixed precision training using FP16 with FP8 per-tensor current scaling.

bf16_with_fp8_subchannel_scaling_mixed

Create a MixedPrecisionConfig for mixed precision training using BF16 with FP8 NV Subchannel scaling. This recipe uses 128x128 blockwise quantization for weight and 1x128 blockwise quantization for activation.

fp16_with_fp8_subchannel_scaling_mixed

Create a MixedPrecisionConfig for mixed precision training using FP16 with FP8 NV Subchannel scaling. This recipe uses 128x128 blockwise quantization for weight and 1x128 blockwise quantization for activation.

get_mixed_precision_config

Return a :class:MixedPrecisionConfig for name.

Data#

API#

class bridge.training.mixed_precision.MixedPrecisionConfig#

Mixed precision configuration for models.

Handles conversion of model parameters and inputs/outputs between different precisions, and manages mixed precision training settings.

fp32: bool#

False

fp16: bool#

False

bf16: bool#

False

params_dtype: Optional[torch.dtype]#

None

pipeline_dtype: Optional[torch.dtype]#

None

autocast_dtype: Optional[torch.dtype]#

None

autocast_enabled: bool#

False

grad_reduce_in_fp32: bool#

True

fp8: Optional[str]#

None

fp8_recipe: str#

‘delayed’

first_last_layers_bf16: bool#

False

fp8_margin: int#

0

fp8_amax_history_len: int#

1

fp8_amax_compute_algo: str#

‘most_recent’

fp8_wgrad: bool#

True

fp8_dot_product_attention: bool#

False

fp8_multi_head_attention: bool#

False

fp8_param: Optional[bool]#

None

fp8_param_gather: bool#

False

loss_scale: Optional[float]#

None

initial_loss_scale: Optional[float]#

4294967296

min_loss_scale: float#

1.0

loss_scale_window: float#

1000

hysteresis: int#

2

num_layers_at_start_in_bf16: int#

0

num_layers_at_end_in_bf16: int#

0

__setattr__(name: str, value) None#
__post_init__()#
setup(
model_config: megatron.bridge.models.GPTModelProvider | megatron.bridge.models.T5ModelProvider,
optimizer_config: Optional[megatron.core.optimizer.OptimizerConfig] = None,
ddp_config: Optional[megatron.core.distributed.DistributedDataParallelConfig] = None,
) None#

Apply mixed precision configs to model, optimizer, and DDP configs.

Parameters:
  • model_config – Model configuration to update with dtype settings

  • optimizer_config – Optional optimizer configuration to update

  • ddp_config – Optional DDP configuration to update

bridge.training.mixed_precision.update_config_with_precision_overrides(
mixed_precision_config: bridge.training.mixed_precision.MixedPrecisionConfig,
config,
)#

Update a config object with precision settings from mixed_precision_config.

Parameters:
  • mixed_precision_config – Source of precision settings

  • config – Config object to update

Returns:

Updated config object

bridge.training.mixed_precision.MIXED_PRECISION_RECIPES: dict[str, Callable[[], bridge.training.mixed_precision.MixedPrecisionConfig]]#

None

bridge.training.mixed_precision.register(
func: Callable[[], bridge.training.mixed_precision.MixedPrecisionConfig],
)#

Decorator that registers a mixed-precision recipe factory by its function name.

bridge.training.mixed_precision.bf16_mixed() bridge.training.mixed_precision.MixedPrecisionConfig#

Create a MixedPrecisionConfig for mixed precision training using BF16.

Returns:

Configuration for BF16 mixed precision training

Return type:

MixedPrecisionConfig

bridge.training.mixed_precision.fp16_mixed() bridge.training.mixed_precision.MixedPrecisionConfig#

Create a MixedPrecisionConfig for mixed precision training using FP16.

Returns:

Configuration for FP16 mixed precision training

Return type:

MixedPrecisionConfig

bridge.training.mixed_precision.bf16_with_fp8_mixed() bridge.training.mixed_precision.MixedPrecisionConfig#

Create a MixedPrecisionConfig for mixed precision training using BF16 with FP8.

Note: FP8 recipes are experimental and have not been tested for training convergence.

Returns:

Configuration for BF16 with FP8 mixed precision training

Return type:

MixedPrecisionConfig

bridge.training.mixed_precision.fp16_with_fp8_mixed() bridge.training.mixed_precision.MixedPrecisionConfig#

Create a MixedPrecisionConfig for mixed precision training using FP16 with FP8.

Note: FP8 recipes are experimental and have not been tested for training convergence.

Returns:

Configuration for FP16 with FP8 mixed precision training

Return type:

MixedPrecisionConfig

bridge.training.mixed_precision.bf16_with_mxfp8_mixed() bridge.training.mixed_precision.MixedPrecisionConfig#

Create a MixedPrecisionConfig for mixed precision training using BF16 with MXFP8.

Returns:

Configuration for BF16 with MXFP8 mixed precision training

Return type:

MixedPrecisionConfig

bridge.training.mixed_precision.fp16_with_mxfp8_mixed() bridge.training.mixed_precision.MixedPrecisionConfig#

Create a MixedPrecisionConfig for mixed precision training using FP16 with MXFP8.

Returns:

Configuration for FP16 with MXFP8 mixed precision training

Return type:

MixedPrecisionConfig

bridge.training.mixed_precision.bf16_with_fp8_current_scaling_mixed() bridge.training.mixed_precision.MixedPrecisionConfig#

Create a MixedPrecisionConfig for mixed precision training using BF16 with FP8 per-tensor current scaling.

Note: The baseline current scaling recipe uses BF16 in the first and last Transformer layers. The user can choose to disable the BF16 layers or apply BF16 to more Transformer layers.

Returns:

Configuration for BF16 with FP8 per-tensor current scaling mixed precision training

Return type:

MixedPrecisionConfig

bridge.training.mixed_precision.nemotron_h_bf16_with_fp8_current_scaling_mixed() bridge.training.mixed_precision.MixedPrecisionConfig#

Create a MixedPrecisionConfig for mixed precision training using BF16 with FP8 per-tensor current scaling.

Note: The baseline current scaling recipe uses BF16 in the first and last Transformer layers. The user can choose to disable the BF16 layers or apply BF16 to more Transformer layers.

Returns:

Configuration for BF16 with FP8 per-tensor current scaling mixed precision training

Return type:

MixedPrecisionConfig

bridge.training.mixed_precision.nanov2_bf16_with_fp8_current_scaling_mixed() bridge.training.mixed_precision.MixedPrecisionConfig#

Create a MixedPrecisionConfig for mixed precision training using BF16 with FP8 per-tensor current scaling.

Note: The baseline current scaling recipe uses BF16 in the first and last Transformer layers. The user can choose to disable the BF16 layers or apply BF16 to more Transformer layers.

Returns:

Configuration for BF16 with FP8 per-tensor current scaling mixed precision training

Return type:

MixedPrecisionConfig

bridge.training.mixed_precision.fp16_with_fp8_current_scaling_mixed() bridge.training.mixed_precision.MixedPrecisionConfig#

Create a MixedPrecisionConfig for mixed precision training using FP16 with FP8 per-tensor current scaling.

Note: The baseline current scaling recipe uses FP16 in the first and last Transformer layers. The user can choose to disable the FP16 layers or apply FP16 to more Transformer layers.

Returns:

Configuration for FP16 with FP8 per-tensor current scaling mixed precision training

Return type:

MixedPrecisionConfig

bridge.training.mixed_precision.bf16_with_fp8_subchannel_scaling_mixed() bridge.training.mixed_precision.MixedPrecisionConfig#

Create a MixedPrecisionConfig for mixed precision training using BF16 with FP8 NV Subchannel scaling. This recipe uses 128x128 blockwise quantization for weight and 1x128 blockwise quantization for activation.

Returns:

Configuration for BF16 with FP8 subchannel scaling mixed precision training

Return type:

MixedPrecisionConfig

bridge.training.mixed_precision.fp16_with_fp8_subchannel_scaling_mixed() bridge.training.mixed_precision.MixedPrecisionConfig#

Create a MixedPrecisionConfig for mixed precision training using FP16 with FP8 NV Subchannel scaling. This recipe uses 128x128 blockwise quantization for weight and 1x128 blockwise quantization for activation.

Returns:

Configuration for FP16 with FP8 subchannel scaling mixed precision training

Return type:

MixedPrecisionConfig

bridge.training.mixed_precision.get_mixed_precision_config(
name: str,
) bridge.training.mixed_precision.MixedPrecisionConfig#

Return a :class:MixedPrecisionConfig for name.

Parameters:

name – Key of the recipe in :pydata:MIXED_PRECISION_RECIPES.

Raises:

ValueError – If name is not a known recipe.