core.fusions.fused_layer_norm#

Module Contents#

Classes#

FusedLayerNorm

Layer Norm, fused into a single CUDA kernel.

API#

class core.fusions.fused_layer_norm.FusedLayerNorm(
config: megatron.core.transformer.TransformerConfig,
hidden_size: int,
eps: float = 1e-05,
persist_layer_norm: bool = True,
zero_centered_gamma: bool = False,
normalization: str = 'LayerNorm',
)#

Bases: torch.nn.Module

Layer Norm, fused into a single CUDA kernel.

Parameters:
  • hidden_size (int) – Transformer hidden dimension.

  • eps (float) – Epsilon added to denominator, for numerical stability.

  • persist_layer_norm (bool) – Use persistent fused layer norm kernel.

  • Please (This kernel supports only a set of hidden sizes.)

  • supported. (check persist_ln_hidden_sizes if your hidden size is)

  • zero_centered_gamma (bool) – Adjust LayerNorm weights such that they are

  • stability. (centered around zero. This improves numerical)

  • config (TransformerConfig) – Transformer config. Include to match custom

  • interfaces. (layer norm)

  • normalization (str) – Normalization type, used for Transformer Engine.

  • here. (Must equal 'LayerNorm')

Initialization

reset_parameters()#
forward(input: torch.Tensor) torch.Tensor#