core.post_training.modelopt.layers#

Module Contents#

Classes#

Norm

A conditional wrapper to initialize an instance of Transformer-Engine’s LayerNorm or RMSNorm based on input. If there is an additional _extra_state, insert _state_dict_hook and _load_state_dict_pre_hook to handle the state_dict mismatch issue.

Linear

Local Linear impl as a replacement of ParallelLinear.

RealQuantTransformerLayer

Real quantization transformer layer base class.

FP8WeightTransformerLayer

FP8 weight transformer layer.

BlockwiseFP8WeightTransformerLayer

Blockwise FP8 weight transformer layer.

Data#

API#

core.post_training.modelopt.layers.logger#

‘getLogger(…)’

core.post_training.modelopt.layers.FP8_PER_TENSOR_REAL_QUANT_CFG#

None

core.post_training.modelopt.layers.FP8_2D_BLOCKWISE_REAL_QUANT_CFG#

None

class core.post_training.modelopt.layers.Norm#

A conditional wrapper to initialize an instance of Transformer-Engine’s LayerNorm or RMSNorm based on input. If there is an additional _extra_state, insert _state_dict_hook and _load_state_dict_pre_hook to handle the state_dict mismatch issue.

__new__(
config: megatron.core.transformer.transformer_config.TransformerConfig,
hidden_size: int,
eps: float = 1e-05,
)#
class core.post_training.modelopt.layers.Linear(
input_size: int,
output_size: int,
*,
config: megatron.core.model_parallel_config.ModelParallelConfig,
init_method: Callable,
bias: bool = True,
gather_output: bool = False,
stride: int = 1,
keep_master_weight_for_test: bool = False,
skip_bias_add: bool = False,
skip_weight_param_allocation: bool = False,
embedding_activation_buffer: Optional[List[torch.Tensor]] = None,
grad_output_buffer: Optional[List[torch.Tensor]] = None,
is_expert: bool = False,
tp_comm_buffer_name: str = None,
disable_grad_reduce: bool = False,
tp_group: Optional[torch.distributed.ProcessGroup] = None,
)#

Bases: torch.nn.Linear

Local Linear impl as a replacement of ParallelLinear.

Initialization

sharded_state_dict(prefix='', sharded_offsets=(), metadata=None)#

Sharding along axis 0, bias sharded

forward(x)#

Forward.

class core.post_training.modelopt.layers.RealQuantTransformerLayer(*args, **kwargs)#

Bases: megatron.core.transformer.transformer_layer.TransformerLayer

Real quantization transformer layer base class.

This base class iniitialize the default TransformerLayer and immediately perform weight-only real quantization via Model Optimizer. All linear weights (Linear, ColumnParallelLinear, RowParallelLinear) picked up will be replaced with low-bit data type (default torch.uint8). If sub-byte real_quant_cfg is used, the weight shape will further be half.

This module cannot be trained (all parameters frozen).

Initialization

verbose: bool#

False

real_quant_cfg: str#

‘None’

_collect_original_tensor_info()#
_report_quantize_tensor_info()#
class core.post_training.modelopt.layers.FP8WeightTransformerLayer(*args, **kwargs)#

Bases: core.post_training.modelopt.layers.RealQuantTransformerLayer

FP8 weight transformer layer.

Initialization

real_quant_cfg: str#

‘fp8_real_quant’

class core.post_training.modelopt.layers.BlockwiseFP8WeightTransformerLayer(*args, **kwargs)#

Bases: core.post_training.modelopt.layers.RealQuantTransformerLayer

Blockwise FP8 weight transformer layer.

Initialization

real_quant_cfg: str#

‘fp8_blockwise_real_quant’