core.models.common.embeddings.relative_pos_embedding#
Module Contents#
Classes#
Relative Position Embedding for language model. |
Data#
API#
- core.models.common.embeddings.relative_pos_embedding.logger#
‘getLogger(…)’
- core.models.common.embeddings.relative_pos_embedding.__all__#
[‘RelativePositionEmbedding’]
- class core.models.common.embeddings.relative_pos_embedding.RelativePositionEmbedding(
- bidirectional: bool,
- init_method: Callable,
- num_attention_heads: int,
- relative_attention_num_buckets: int = 32,
- relative_attention_max_distance: int = 128,
Bases:
torch.nn.ModuleRelative Position Embedding for language model.
Args:
Initialization
- _relative_position_bucket(
- relative_position,
- bidirectional=True,
- num_buckets=32,
- max_distance=128,
Adapted from HuggingFace T5 Model: https://github.com/huggingface/transformers/blob/329f5dbf97a5cb2473914c88c05aa3dcb242e19a/ src/transformers/models/t5/modeling_t5.py#L397
Translate relative position to a bucket number for relative attention. The relative position is defined as memory_position - query_position, i.e. the distance in tokens from the attending position to the attended-to position. If bidirectional=False, then positive relative positions are invalid. We use smaller buckets for small absolute relative_position and larger buckets for larger absolute relative_positions. All relative positions >=max_distance map to the same bucket. All relative positions <=-max_distance map to the same bucket. This should allow for more graceful generalization to longer sequences than the model has been trained on.
- Parameters:
relative_position – an int32 Tensor
bidirectional – a boolean - whether the attention is bidirectional
num_buckets – an integer
max_distance – an integer
- Returns:
a Tensor with the same shape as relative_position, containing int32 values in the range [0, num_buckets)
- _compute_bias(query_length, key_length)#
Adapted from HuggingFace T5 Model https://github.com/huggingface/transformers/blob/329f5dbf97a5cb2473914c88c05aa3dcb242e19a/ src/transformers/models/t5/modeling_t5.py#L444C9-L444C21
Compute binned relative position bias
- Parameters:
query_length (int) – The length of the query sequence
(e.g.
attention). (the sequence to compare against in)
key_length (int) – The length of the key sequence
(e.g.
attention).
- Returns:
A tensor representing the relative position bias, with shape (1, num_heads, query_length, key_length).
- Return type:
torch.Tensor
- static get_relative_seq_len(
- inference_context: megatron.core.inference.contexts.BaseInferenceContext,
- transformer: megatron.core.transformer.transformer_block.TransformerBlock,
- transformer_input: torch.Tensor,
- transformer_config: megatron.core.transformer.transformer_config.TransformerConfig,
- *,
- inference_params: Optional[megatron.core.inference.contexts.BaseInferenceContext] = None,
Function to get the rotary sequence length.
- Parameters:
inference_context (BaseInferenceContext) – Used during Inference time
transformer (TransformerBlock) – The transformer block (decoder/encoder) used by the model
transformer_input (Tensor) – Input tensor to the transformer
transformer_config (TransformerConfig) – Transformer config used by the model
- Returns:
The rotary sequence length
- Return type:
float
- forward(query_seq_length, key_seq_length)#
Args: Returns: