bridge.models.gpt_full_te_layer_autocast_spec#

Module Contents#

Classes#

AutocastTransformerLayer

Wrapper of te.pytorch.TransformerLayer: a single transformerlayer that takes input with size [s, b, h] and returns an output of the same size.

TETransformerLayerAutocast

A MegatronModule that wraps the AutocastTransformerLayer.

Functions#

get_gpt_full_te_layer_autocast_spec

Get the ModuleSpec for full Transformer layer from Transformer Engine.

torch_dtype_from_precision

Mapping from precision types to corresponding PyTorch parameter datatype.

API#

class bridge.models.gpt_full_te_layer_autocast_spec.AutocastTransformerLayer(
hidden_size: int,
ffn_hidden_size: int,
layernorm_epsilon: float,
num_attention_heads: int,
init_method: Callable,
output_layer_init_method: Callable,
hidden_dropout: float,
attention_dropout: float,
layer_number: Optional[int] = None,
kv_channels: Optional[int] = None,
self_attn_mask_type: str = 'causal',
tp_group: Optional[Any] = None,
tp_size: int = 1,
params_dtype: torch.dtype = torch.float32,
get_rng_state_tracker: Optional[Callable] = None,
fuse_wgrad_accumulation: bool = False,
seq_length: Optional[int] = None,
micro_batch_size: Optional[int] = None,
sequence_parallel: bool = False,
apply_residual_connection_post_layernorm: bool = False,
output_layernorm: bool = False,
layer_type: str = 'encoder',
drop_path_rate: float = 0,
use_emha: bool = False,
ub_tp_comm_overlap: bool = False,
ub_bulk_wgrad: bool = True,
ub_bulk_dgrad: bool = True,
autocast_dtype: Any = 16,
zero_centered_gamma: bool = False,
device: str = 'cuda',
**kwargs,
)#

Bases: transformer_engine.pytorch.TransformerLayer

Wrapper of te.pytorch.TransformerLayer: a single transformerlayer that takes input with size [s, b, h] and returns an output of the same size.

Initialization

forward(
hidden_states: torch.Tensor,
attention_mask: torch.Tensor = None,
encoder_output: Optional[torch.Tensor] = None,
enc_dec_attn_mask: Optional[torch.Tensor] = None,
inference_params: Optional[Any] = None,
is_first_microbatch: Optional[bool] = None,
checkpoint_core_attention: Optional[bool] = False,
) torch.Tensor#

Perform a forward pass through the transformer layer.

class bridge.models.gpt_full_te_layer_autocast_spec.TETransformerLayerAutocast(
config,
layer_number=1,
hidden_dropout=None,
**kwargs,
)#

Bases: megatron.core.transformer.module.MegatronModule, megatron.core.transformer.transformer_layer.BaseTransformerLayer

A MegatronModule that wraps the AutocastTransformerLayer.

Initialization

forward(
hidden_states,
is_first_microbatch=None,
attention_mask=None,
context=None,
context_mask=None,
inference_params=None,
**kwargs,
)#

Forward function of TETransformerLayerAutocast. Called by MCore’s TransformerBlock.forward.

_get_layer_offset()#
sharded_state_dict(
prefix: str = '',
sharded_offsets: tuple = (),
metadata=None,
)#

Get the sharded state dict for the transformer layer.

__call__(*args, **kwargs)#
bridge.models.gpt_full_te_layer_autocast_spec.get_gpt_full_te_layer_autocast_spec(
transformer_config,
) megatron.core.transformer.spec_utils.ModuleSpec#

Get the ModuleSpec for full Transformer layer from Transformer Engine.

bridge.models.gpt_full_te_layer_autocast_spec.torch_dtype_from_precision(
precision: Union[int, str],
) torch.dtype#

Mapping from precision types to corresponding PyTorch parameter datatype.