core.models.common.embeddings.language_model_embedding#
Module Contents#
Classes#
Language model embeddings. |
API#
- class core.models.common.embeddings.language_model_embedding.LanguageModelEmbedding(
- config: megatron.core.transformer.transformer_config.TransformerConfig,
- vocab_size: int,
- max_sequence_length: int,
- position_embedding_type: Literal[learned_absolute, rope, none] = 'learned_absolute',
- num_tokentypes: int = 0,
- scatter_to_sequence_parallel: bool = True,
- tp_group: Optional[torch.distributed.ProcessGroup] = None,
Bases:
megatron.core.transformer.module.MegatronModuleLanguage model embeddings.
- Parameters:
config (TransformerConfig) – config object with all necessary configs for TransformerBlock
vocab_size (int) – vocabulary size
max_sequence_length (int) – maximum size of sequence. This is used for positional embedding
add_position_embedding (bool) – Add a position embedding.
embedding_dropout_prob (float) – dropout probability for embeddings
num_tokentypes (int) – Set to 0 without binary head, and 2 with a binary head. Defaults to 0.
scatter_to_sequence_parallel (bool) – Set to False to disable scatter of embedding across sequence parallel region. Defaults to True.
Initialization
- zero_parameters()#
Zero out all parameters in embedding.
- forward(
- input_ids: torch.Tensor,
- position_ids: torch.Tensor,
- tokentype_ids: int = None,
Forward pass of the embedding module.
- Parameters:
input_ids (Tensor) – The input tokens
position_ids (Tensor) – The position id’s used to calculate position embeddings
tokentype_ids (int) – The token type ids. Used when args.bert_binary_head is set to True. Defaults to None
- Returns:
The output embeddings
- Return type:
Tensor