core.models.bert.pooler#
Module Contents#
Classes#
Pooler layer. |
API#
- class core.models.bert.pooler.Pooler(
- hidden_size: int,
- init_method: callable,
- config: megatron.core.transformer.transformer_config.TransformerConfig,
- sequence_parallel: bool = False,
Bases:
megatron.core.transformer.module.MegatronModulePooler layer.
Pool hidden states of a specific token (for example start of the sequence) and add a linear transformation followed by a tanh.
- Parameters:
hidden_size (int) – The hidden size_
init_method (callable) – weight initialization method for the linear layer. bias is set to zero.
config (TransformerConfig) – The transformer configuration
sequence_parallel (bool) – Using squence parallel ? Defaults to False
Initialization
- forward(hidden_states: torch.Tensor, sequence_index=0)#