core.models.common.embeddings.relative_pos_embedding#

Module Contents#

Classes#

RelativePositionEmbedding

Relative Position Embedding for language model.

Data#

API#

core.models.common.embeddings.relative_pos_embedding.logger#

‘getLogger(…)’

core.models.common.embeddings.relative_pos_embedding.__all__#

[‘RelativePositionEmbedding’]

class core.models.common.embeddings.relative_pos_embedding.RelativePositionEmbedding(
bidirectional: bool,
init_method: Callable,
num_attention_heads: int,
relative_attention_num_buckets: int = 32,
relative_attention_max_distance: int = 128,
)#

Bases: torch.nn.Module

Relative Position Embedding for language model.

Args:

Initialization

_relative_position_bucket(
relative_position,
bidirectional=True,
num_buckets=32,
max_distance=128,
)#

Adapted from HuggingFace T5 Model: https://github.com/huggingface/transformers/blob/329f5dbf97a5cb2473914c88c05aa3dcb242e19a/ src/transformers/models/t5/modeling_t5.py#L397

Translate relative position to a bucket number for relative attention. The relative position is defined as memory_position - query_position, i.e. the distance in tokens from the attending position to the attended-to position. If bidirectional=False, then positive relative positions are invalid. We use smaller buckets for small absolute relative_position and larger buckets for larger absolute relative_positions. All relative positions >=max_distance map to the same bucket. All relative positions <=-max_distance map to the same bucket. This should allow for more graceful generalization to longer sequences than the model has been trained on.

Parameters:
  • relative_position – an int32 Tensor

  • bidirectional – a boolean - whether the attention is bidirectional

  • num_buckets – an integer

  • max_distance – an integer

Returns:

a Tensor with the same shape as relative_position, containing int32 values in the range [0, num_buckets)

_compute_bias(query_length, key_length)#

Adapted from HuggingFace T5 Model https://github.com/huggingface/transformers/blob/329f5dbf97a5cb2473914c88c05aa3dcb242e19a/ src/transformers/models/t5/modeling_t5.py#L444C9-L444C21

Compute binned relative position bias

Parameters:
  • query_length (int) – The length of the query sequence

  • (e.g.

  • attention). (the sequence to compare against in)

  • key_length (int) – The length of the key sequence

  • (e.g.

  • attention).

Returns:

A tensor representing the relative position bias, with shape (1, num_heads, query_length, key_length).

Return type:

torch.Tensor

static get_relative_seq_len(
inference_context: megatron.core.inference.contexts.BaseInferenceContext,
transformer: megatron.core.transformer.transformer_block.TransformerBlock,
transformer_input: torch.Tensor,
transformer_config: megatron.core.transformer.transformer_config.TransformerConfig,
*,
inference_params: Optional[megatron.core.inference.contexts.BaseInferenceContext] = None,
) float#

Function to get the rotary sequence length.

Parameters:
  • inference_context (BaseInferenceContext) – Used during Inference time

  • transformer (TransformerBlock) – The transformer block (decoder/encoder) used by the model

  • transformer_input (Tensor) – Input tensor to the transformer

  • transformer_config (TransformerConfig) – Transformer config used by the model

Returns:

The rotary sequence length

Return type:

float

forward(query_seq_length, key_seq_length)#

Args: Returns: