`bridge.models.gpt_full_te_layer_autocast_spec`#

Module Contents#

Classes#

`AutocastTransformerLayer`	Wrapper of te.pytorch.TransformerLayer: a single transformerlayer that takes input with size [s, b, h] and returns an output of the same size.
`TETransformerLayerAutocast`	A MegatronModule that wraps the AutocastTransformerLayer.

Functions#

`get_gpt_full_te_layer_autocast_spec`	Get the ModuleSpec for full Transformer layer from Transformer Engine.
`torch_dtype_from_precision`	Mapping from precision types to corresponding PyTorch parameter datatype.

API#

class bridge.models.gpt_full_te_layer_autocast_spec.AutocastTransformerLayer(

hidden_size: int,

ffn_hidden_size: int,

layernorm_epsilon: float,

num_attention_heads: int,

init_method: Callable,

output_layer_init_method: Callable,

hidden_dropout: float,

attention_dropout: float,

layer_number: Optional[int] = None,

kv_channels: Optional[int] = None,

self_attn_mask_type: str = 'causal',

tp_group: Optional[Any] = None,

tp_size: int = 1,

params_dtype: torch.dtype = torch.float32,

get_rng_state_tracker: Optional[Callable] = None,

fuse_wgrad_accumulation: bool = False,

seq_length: Optional[int] = None,

micro_batch_size: Optional[int] = None,

sequence_parallel: bool = False,

apply_residual_connection_post_layernorm: bool = False,

output_layernorm: bool = False,

layer_type: str = 'encoder',

drop_path_rate: float = 0,

use_emha: bool = False,

ub_tp_comm_overlap: bool = False,

ub_bulk_wgrad: bool = True,

ub_bulk_dgrad: bool = True,

autocast_dtype: Any = 16,

zero_centered_gamma: bool = False,

device: str = 'cuda',

**kwargs,

)#

Bases: transformer_engine.pytorch.TransformerLayer

Wrapper of te.pytorch.TransformerLayer: a single transformerlayer that takes input with size [s, b, h] and returns an output of the same size.

Initialization

forward( hidden_states: torch.Tensor, attention_mask: torch.Tensor = None, encoder_output: Optional[torch.Tensor] = None, enc_dec_attn_mask: Optional[torch.Tensor] = None, inference_params: Optional[Any] = None, is_first_microbatch: Optional[bool] = None, checkpoint_core_attention: Optional[bool] = False, ) → torch.Tensor#: Perform a forward pass through the transformer layer.

class bridge.models.gpt_full_te_layer_autocast_spec.TETransformerLayerAutocast(

config,

layer_number=1,

hidden_dropout=None,

**kwargs,

)#

Bases: megatron.core.transformer.module.MegatronModule, megatron.core.transformer.transformer_layer.BaseTransformerLayer

A MegatronModule that wraps the AutocastTransformerLayer.