core.models.common.embeddings.language_model_embedding#

Module Contents#

Classes#

LanguageModelEmbedding

Language model embeddings.

API#

class core.models.common.embeddings.language_model_embedding.LanguageModelEmbedding(
config: megatron.core.transformer.transformer_config.TransformerConfig,
vocab_size: int,
max_sequence_length: int,
position_embedding_type: Literal[learned_absolute, rope, none] = 'learned_absolute',
num_tokentypes: int = 0,
scatter_to_sequence_parallel: bool = True,
tp_group: Optional[torch.distributed.ProcessGroup] = None,
)#

Bases: megatron.core.transformer.module.MegatronModule

Language model embeddings.

Parameters:
  • config (TransformerConfig) – config object with all necessary configs for TransformerBlock

  • vocab_size (int) – vocabulary size

  • max_sequence_length (int) – maximum size of sequence. This is used for positional embedding

  • add_position_embedding (bool) – Add a position embedding.

  • embedding_dropout_prob (float) – dropout probability for embeddings

  • num_tokentypes (int) – Set to 0 without binary head, and 2 with a binary head. Defaults to 0.

  • scatter_to_sequence_parallel (bool) – Set to False to disable scatter of embedding across sequence parallel region. Defaults to True.

Initialization

zero_parameters()#

Zero out all parameters in embedding.

forward(
input_ids: torch.Tensor,
position_ids: torch.Tensor,
tokentype_ids: int = None,
) torch.Tensor#

Forward pass of the embedding module.

Parameters:
  • input_ids (Tensor) – The input tokens

  • position_ids (Tensor) – The position id’s used to calculate position embeddings

  • tokentype_ids (int) – The token type ids. Used when args.bert_binary_head is set to True. Defaults to None

Returns:

The output embeddings

Return type:

Tensor