core.post_training.modelopt.layers#
Module Contents#
Classes#
A conditional wrapper to initialize an instance of Transformer-Engine’s
|
|
Local Linear impl as a replacement of ParallelLinear. |
|
Real quantization transformer layer base class. |
|
FP8 weight transformer layer. |
|
Blockwise FP8 weight transformer layer. |
Data#
API#
- core.post_training.modelopt.layers.logger#
‘getLogger(…)’
- core.post_training.modelopt.layers.FP8_PER_TENSOR_REAL_QUANT_CFG#
None
- core.post_training.modelopt.layers.FP8_2D_BLOCKWISE_REAL_QUANT_CFG#
None
- class core.post_training.modelopt.layers.Norm#
A conditional wrapper to initialize an instance of Transformer-Engine’s
LayerNormorRMSNormbased on input. If there is an additional _extra_state, insert _state_dict_hook and _load_state_dict_pre_hook to handle the state_dict mismatch issue.- __new__(
- config: megatron.core.transformer.transformer_config.TransformerConfig,
- hidden_size: int,
- eps: float = 1e-05,
- class core.post_training.modelopt.layers.Linear(
- input_size: int,
- output_size: int,
- *,
- config: megatron.core.model_parallel_config.ModelParallelConfig,
- init_method: Callable,
- bias: bool = True,
- gather_output: bool = False,
- stride: int = 1,
- keep_master_weight_for_test: bool = False,
- skip_bias_add: bool = False,
- skip_weight_param_allocation: bool = False,
- embedding_activation_buffer: Optional[List[torch.Tensor]] = None,
- grad_output_buffer: Optional[List[torch.Tensor]] = None,
- is_expert: bool = False,
- tp_comm_buffer_name: str = None,
- disable_grad_reduce: bool = False,
- tp_group: Optional[torch.distributed.ProcessGroup] = None,
Bases:
torch.nn.LinearLocal Linear impl as a replacement of ParallelLinear.
Initialization
- sharded_state_dict(prefix='', sharded_offsets=(), metadata=None)#
Sharding along axis 0, bias sharded
- forward(x)#
Forward.
- class core.post_training.modelopt.layers.RealQuantTransformerLayer(*args, **kwargs)#
Bases:
megatron.core.transformer.transformer_layer.TransformerLayerReal quantization transformer layer base class.
This base class iniitialize the default TransformerLayer and immediately perform weight-only real quantization via Model Optimizer. All linear weights (Linear, ColumnParallelLinear, RowParallelLinear) picked up will be replaced with low-bit data type (default torch.uint8). If sub-byte real_quant_cfg is used, the weight shape will further be half.
This module cannot be trained (all parameters frozen).
Initialization
- verbose: bool#
False
- real_quant_cfg: str#
‘None’
- _collect_original_tensor_info()#
- _report_quantize_tensor_info()#
- class core.post_training.modelopt.layers.FP8WeightTransformerLayer(*args, **kwargs)#
Bases:
core.post_training.modelopt.layers.RealQuantTransformerLayerFP8 weight transformer layer.
Initialization
- real_quant_cfg: str#
‘fp8_real_quant’
- class core.post_training.modelopt.layers.BlockwiseFP8WeightTransformerLayer(*args, **kwargs)#
Bases:
core.post_training.modelopt.layers.RealQuantTransformerLayerBlockwise FP8 weight transformer layer.
Initialization
- real_quant_cfg: str#
‘fp8_blockwise_real_quant’