`core.post_training.modelopt.layers`#

Module Contents#

Classes#

`Norm`	A conditional wrapper to initialize an instance of Transformer-Engine’s `LayerNorm` or `RMSNorm` based on input. If there is an additional _extra_state, insert _state_dict_hook and _load_state_dict_pre_hook to handle the state_dict mismatch issue.
`Linear`	Local Linear impl as a replacement of ParallelLinear.
`RealQuantTransformerLayer`	Real quantization transformer layer base class.
`FP8WeightTransformerLayer`	FP8 weight transformer layer.
`BlockwiseFP8WeightTransformerLayer`	Blockwise FP8 weight transformer layer.

Data#

`logger`
`FP8_PER_TENSOR_REAL_QUANT_CFG`
`FP8_2D_BLOCKWISE_REAL_QUANT_CFG`

API#

core.post_training.modelopt.layers.logger#: ‘getLogger(…)’

core.post_training.modelopt.layers.FP8_PER_TENSOR_REAL_QUANT_CFG#: None

core.post_training.modelopt.layers.FP8_2D_BLOCKWISE_REAL_QUANT_CFG#: None

class core.post_training.modelopt.layers.Norm#

A conditional wrapper to initialize an instance of Transformer-Engine’s LayerNorm or RMSNorm based on input. If there is an additional _extra_state, insert _state_dict_hook and _load_state_dict_pre_hook to handle the state_dict mismatch issue.

__new__( config: megatron.core.transformer.transformer_config.TransformerConfig, hidden_size: int, eps: float = 1e-05, )#

class core.post_training.modelopt.layers.Linear( input_size: int, output_size: int, *, config: megatron.core.model_parallel_config.ModelParallelConfig, init_method: Callable, bias: bool = True, gather_output: bool = False, stride: int = 1, keep_master_weight_for_test: bool = False, skip_bias_add: bool = False, skip_weight_param_allocation: bool = False, embedding_activation_buffer: Optional[List[torch.Tensor]] = None, grad_output_buffer: Optional[List[torch.Tensor]] = None, is_expert: bool = False, tp_comm_buffer_name: str = None, disable_grad_reduce: bool = False, tp_group: Optional[torch.distributed.ProcessGroup] = None, )#

Bases: torch.nn.Linear

Local Linear impl as a replacement of ParallelLinear.

Initialization

sharded_state_dict(prefix='', sharded_offsets=(), metadata=None)#: Sharding along axis 0, bias sharded

forward(x)#: Forward.

class core.post_training.modelopt.layers.RealQuantTransformerLayer(*args, **kwargs)#

Bases: megatron.core.transformer.transformer_layer.TransformerLayer

Real quantization transformer layer base class.

This base class iniitialize the default TransformerLayer and immediately perform weight-only real quantization via Model Optimizer. All linear weights (Linear, ColumnParallelLinear, RowParallelLinear) picked up will be replaced with low-bit data type (default torch.uint8). If sub-byte real_quant_cfg is used, the weight shape will further be half.

This module cannot be trained (all parameters frozen).

Initialization

verbose: bool#: False

real_quant_cfg: str#: ‘None’

_collect_original_tensor_info()#

_report_quantize_tensor_info()#

class core.post_training.modelopt.layers.FP8WeightTransformerLayer(*args, **kwargs)#

Bases: core.post_training.modelopt.layers.RealQuantTransformerLayer

FP8 weight transformer layer.

Initialization

real_quant_cfg: str#: ‘fp8_real_quant’

class core.post_training.modelopt.layers.BlockwiseFP8WeightTransformerLayer(*args, **kwargs)#

Bases: core.post_training.modelopt.layers.RealQuantTransformerLayer

Blockwise FP8 weight transformer layer.

Initialization

real_quant_cfg: str#: ‘fp8_blockwise_real_quant’

core.post_training.modelopt.layers#

Module Contents#

Classes#

Data#

API#

`core.post_training.modelopt.layers`#