NVIDIA Docs Hub NVIDIA Megatron-Core Megatron Core User Guide models.bert package

models.bert package

Useful package for training bert and bert like encoder only models. It optionally comes with a binary head that can be used for classification tasks .

Submodules

models.bert.bert_model module

class core.models.bert.bert_model.BertModel(*args: Any, **kwargs: Any)

Bases: megatron.core.models.common.language_module.language_module.LanguageModule

Transformer language model.

Parameters

config (TransformerConfig) – transformer config
num_tokentypes (int) – Set to 2 when args.bert_binary_head is True, and 0 otherwise. Defaults to 0.
transformer_layer_spec (ModuleSpec) – Specifies module to use for transformer layers
vocab_size (int) – vocabulary size
max_sequence_length (int) – maximum size of sequence. This is used for positional embedding
pre_process (bool) – Include embedding layer (used with pipeline parallelism)
post_process (bool) – Include an output layer (used with pipeline parallelism)
parallel_output (bool) – Do not gather the outputs, keep them split across tensor parallel ranks
share_embeddings_and_output_weights (bool) – When True, input embeddings and output logit weights are shared. Defaults to False.
position_embedding_type (string) – Position embedding type. Options [‘learned_absolute’, ‘rope’]. Defaults is ‘learned_absolute’.
rotary_percent (float) – Percent of rotary dimension to use for rotary position embeddings. Defaults to 1.0 (100%). Ignored unless position_embedding_type is ‘rope’.

bert_extended_attention_mask(attention_mask: torch.Tensor) → torch.Tensor

Creates the extended attention mask

Converts the attention mask of dimension [batch size, 1, seq len] to [batch size, 1, seq len, seq len] or [batch size, 1, 1, seq_len] and makes it binary

Parameters: attention_mask (Tensor) – The input attention mask
Returns: The extended binary attention mask
Return type: Tensor

bert_position_ids(token_ids): Position ids for bert model

forward(input_ids: torch.Tensor, attention_mask: torch.Tensor, tokentype_ids: Optional[torch.Tensor] = None, lm_labels: Optional[torch.Tensor] = None, inference_params=None)

Forward function of BERT model

Forward function of the BERT Model This function passes the input tensors through the embedding layer, and then the encoder and finally into the post processing layer (optional).

It either returns the Loss values if labels are given or the final hidden units

set_input_tensor(input_tensor: torch.Tensor) → None

Sets input tensor to the model.

See megatron.model.transformer.set_input_tensor()

Parameters: input_tensor (Tensor) – Sets the input tensor for the model.

core.models.bert.bert_model.get_te_version(): Included for backwards compatibility.

Module contents

Previous models.t5 package

Next tensor_parallel package