core.models.bert.pooler#

Module Contents#

Classes#

Pooler

Pooler layer.

API#

class core.models.bert.pooler.Pooler(
hidden_size: int,
init_method: callable,
config: megatron.core.transformer.transformer_config.TransformerConfig,
sequence_parallel: bool = False,
)#

Bases: megatron.core.transformer.module.MegatronModule

Pooler layer.

Pool hidden states of a specific token (for example start of the sequence) and add a linear transformation followed by a tanh.

Parameters:
  • hidden_size (int) – The hidden size_

  • init_method (callable) – weight initialization method for the linear layer. bias is set to zero.

  • config (TransformerConfig) – The transformer configuration

  • sequence_parallel (bool) – Using squence parallel ? Defaults to False

Initialization

forward(hidden_states: torch.Tensor, sequence_index=0)#