bridge.models.gpt_full_te_layer_autocast_spec
#
Module Contents#
Classes#
Wrapper of te.pytorch.TransformerLayer: a single transformerlayer that takes input with size [s, b, h] and returns an output of the same size. |
|
A MegatronModule that wraps the AutocastTransformerLayer. |
Functions#
Get the ModuleSpec for full Transformer layer from Transformer Engine. |
|
Mapping from precision types to corresponding PyTorch parameter datatype. |
API#
- class bridge.models.gpt_full_te_layer_autocast_spec.AutocastTransformerLayer(
- hidden_size: int,
- ffn_hidden_size: int,
- layernorm_epsilon: float,
- num_attention_heads: int,
- init_method: Callable,
- output_layer_init_method: Callable,
- hidden_dropout: float,
- attention_dropout: float,
- layer_number: Optional[int] = None,
- kv_channels: Optional[int] = None,
- self_attn_mask_type: str = 'causal',
- tp_group: Optional[Any] = None,
- tp_size: int = 1,
- params_dtype: torch.dtype = torch.float32,
- get_rng_state_tracker: Optional[Callable] = None,
- fuse_wgrad_accumulation: bool = False,
- seq_length: Optional[int] = None,
- micro_batch_size: Optional[int] = None,
- sequence_parallel: bool = False,
- apply_residual_connection_post_layernorm: bool = False,
- output_layernorm: bool = False,
- layer_type: str = 'encoder',
- drop_path_rate: float = 0,
- use_emha: bool = False,
- ub_tp_comm_overlap: bool = False,
- ub_bulk_wgrad: bool = True,
- ub_bulk_dgrad: bool = True,
- autocast_dtype: Any = 16,
- zero_centered_gamma: bool = False,
- device: str = 'cuda',
- **kwargs,
Bases:
transformer_engine.pytorch.TransformerLayer
Wrapper of te.pytorch.TransformerLayer: a single transformerlayer that takes input with size [s, b, h] and returns an output of the same size.
Initialization
- forward(
- hidden_states: torch.Tensor,
- attention_mask: torch.Tensor = None,
- encoder_output: Optional[torch.Tensor] = None,
- enc_dec_attn_mask: Optional[torch.Tensor] = None,
- inference_params: Optional[Any] = None,
- is_first_microbatch: Optional[bool] = None,
- checkpoint_core_attention: Optional[bool] = False,
Perform a forward pass through the transformer layer.
- class bridge.models.gpt_full_te_layer_autocast_spec.TETransformerLayerAutocast(
- config,
- layer_number=1,
- hidden_dropout=None,
- **kwargs,
Bases:
megatron.core.transformer.module.MegatronModule
,megatron.core.transformer.transformer_layer.BaseTransformerLayer
A MegatronModule that wraps the AutocastTransformerLayer.
Initialization
- forward(
- hidden_states,
- is_first_microbatch=None,
- attention_mask=None,
- context=None,
- context_mask=None,
- inference_params=None,
- **kwargs,
Forward function of TETransformerLayerAutocast. Called by MCore’s TransformerBlock.forward.
- _get_layer_offset()#
- sharded_state_dict(
- prefix: str = '',
- sharded_offsets: tuple = (),
- metadata=None,
Get the sharded state dict for the transformer layer.
- __call__(*args, **kwargs)#
- bridge.models.gpt_full_te_layer_autocast_spec.get_gpt_full_te_layer_autocast_spec(
- transformer_config,
Get the ModuleSpec for full Transformer layer from Transformer Engine.
- bridge.models.gpt_full_te_layer_autocast_spec.torch_dtype_from_precision(
- precision: Union[int, str],
Mapping from precision types to corresponding PyTorch parameter datatype.