core.models.common.embeddings.rotary_pos_embedding#
Module Contents#
Classes#
Rotary Embedding for language model. |
|
Multimodal Rotary Embedding for language model. Based on https://github.com/alibaba/Pai-Megatron-Patch/blob/ efa5a752e845267936db9ae7df1b6aba92e9ff9a/megatron_patch/model/qwen2_vl/rotary_pos_embedding.py Copyright (c) 2025 alibaba/Pai-Megatron-Patch. Apache 2.0 license. |
Data#
API#
- core.models.common.embeddings.rotary_pos_embedding.logger#
‘getLogger(…)’
- core.models.common.embeddings.rotary_pos_embedding.__all__#
[‘RotaryEmbedding’, ‘MultimodalRotaryEmbedding’]
- class core.models.common.embeddings.rotary_pos_embedding.RotaryEmbedding(
- kv_channels: int,
- rotary_percent: float,
- rotary_interleaved: bool = False,
- seq_len_interpolation_factor: float = None,
- rotary_base: int = 10000,
- rope_scaling: bool = False,
- rope_scaling_factor: float = 8.0,
- use_cpu_initialization: bool = False,
- cp_group: Optional[torch.distributed.ProcessGroup] = None,
Bases:
torch.nn.ModuleRotary Embedding for language model.
- Parameters:
kv_channels (int) – Projection weights dimension in multi-head attention. Obtained from transformer config
rotary_percent (float) – Percent of rotary dimension to use for rotary position embeddings.
rotary_interleaved (bool, optional) – If True, interleaved rotary position embeddings. Defaults to False.
seq_len_interpolation_factor (float, optional) – scale of linearly interpolating RoPE for longer sequences. The value must be a float larger than 1.0. Defaults to None
rotary_base (int, optional) – Base period for rotary position embeddings. Defaults to 10000.
rope_scaling (bool, optional) – Apply rope scaling as used in llama 3.x.
rope_scaling_factor (float, optional) – rope scaling factor in llama 3.x. Defaults to 8.
use_cpu_initialization (bool, optional) – If False, initialize the inv_freq directly on the GPU. Defaults to False
cp_group (torch.distributed.ProcessGroup, optional) – Process group for context parallel. Defaults to None.
Initialization
- _apply_scaling(
- freqs,
- factor=8,
- low_freq_factor=1,
- high_freq_factor=4,
- original_max_position_embeddings=8192,
- get_freqs_non_repeated(
- max_seq_len: int,
- offset: int = 0,
Generates matrix of frequencies based on positions in the sequence, used to create positional encodings
- get_cos_sin(
- max_seq_len: int,
- offset: int = 0,
Cosine and sine values for RoPE are precomputed for all positions up to the maximum sequence length
- forward(
- max_seq_len: int,
- offset: int = 0,
- packed_seq: bool = False,
Forward pass of RoPE embedding.
- Parameters:
max_seq_len (int) – Maximum size of sequence
offset (int, optional) – RoPE offset. Defaults to 0.
packed_seq (bool, optional) – Whether to use packed sequence. Defaults to False.
- Returns:
Embeddings after applying RoPE.
- Return type:
Tensor
- _load_from_state_dict(state_dict, prefix, *args, **kwargs)#
- get_rotary_seq_len(
- inference_context: megatron.core.inference.contexts.BaseInferenceContext,
- transformer: megatron.core.transformer.transformer_block.TransformerBlock,
- transformer_input: torch.Tensor,
- transformer_config: megatron.core.transformer.transformer_config.TransformerConfig,
- packed_seq_params: Optional[megatron.core.packed_seq_params.PackedSeqParams] = None,
- *,
- inference_params: Optional[megatron.core.inference.contexts.BaseInferenceContext] = None,
Function to get the rotary sequence length.
- Parameters:
inference_context – Used during Inference time
transformer (TransformerBlock) – The transformer block (decoder/encoder) used by the model
transformer_input (Tensor) – Input tensor to the transformer
transformer_config (TransformerConfig) – Transformer config used by the model
packed_seq_params (PackedSeqParams) – Packed sequence params
- Returns:
The rotary sequence length
- Return type:
int
- class core.models.common.embeddings.rotary_pos_embedding.MultimodalRotaryEmbedding(
- kv_channels: int,
- rotary_percent: float,
- rotary_interleaved: bool = False,
- seq_len_interpolation_factor: Optional[float] = None,
- rotary_base: int = 10000,
- cp_group: Optional[torch.distributed.ProcessGroup] = None,
Bases:
torch.nn.ModuleMultimodal Rotary Embedding for language model. Based on https://github.com/alibaba/Pai-Megatron-Patch/blob/ efa5a752e845267936db9ae7df1b6aba92e9ff9a/megatron_patch/model/qwen2_vl/rotary_pos_embedding.py Copyright (c) 2025 alibaba/Pai-Megatron-Patch. Apache 2.0 license.
- Parameters:
kv_channels (int) – Projection weights dimension in multi-head attention. Obtained from transformer config
rotary_percent (float) – Percent of rotary dimension to use for rotary position embeddings.
rotary_interleaved (bool, optional) – If True, interleaved rotary position embeddings. Defaults to False.
seq_len_interpolation_factor (float, optional) – scale of linearly interpolating RoPE for longer sequences. The value must be a float larger than 1.0. Defaults to None
rotary_base (int, optional) – Base period for rotary position embeddings. Defaults to 10000.
Initialization
- forward(
- position_ids: torch.Tensor,
- mrope_section: List[int],
Forward pass of multimodal RoPE embedding.
- Parameters:
position_ids (torch.Tensor) – A postion_id tensor with shape [3, batchsize, seqlens]
mrope_section (list[int]) – Multimodal rope section is for channel dimension of temporal, height and width in rope calculation.
- Returns:
Embeddings after applying RoPE.
- Return type:
Tensor