`core.models.common.embeddings.rotary_pos_embedding`#

Module Contents#

Classes#

`RotaryEmbedding`	Rotary Embedding for language model.
`MultimodalRotaryEmbedding`	Multimodal Rotary Embedding for language model. Based on https://github.com/alibaba/Pai-Megatron-Patch/blob/ efa5a752e845267936db9ae7df1b6aba92e9ff9a/megatron_patch/model/qwen2_vl/rotary_pos_embedding.py Copyright (c) 2025 alibaba/Pai-Megatron-Patch. Apache 2.0 license.

Data#

`logger`
`__all__`

API#

core.models.common.embeddings.rotary_pos_embedding.logger#: ‘getLogger(…)’

core.models.common.embeddings.rotary_pos_embedding.__all__#: [‘RotaryEmbedding’, ‘MultimodalRotaryEmbedding’]

class core.models.common.embeddings.rotary_pos_embedding.RotaryEmbedding( kv_channels: int, rotary_percent: float, rotary_interleaved: bool = False, seq_len_interpolation_factor: float = None, rotary_base: int = 10000, rope_scaling: bool = False, rope_scaling_factor: float = 8.0, use_cpu_initialization: bool = False, cp_group: Optional[torch.distributed.ProcessGroup] = None, )#

Bases: torch.nn.Module

Rotary Embedding for language model.

Parameters:

kv_channels (int) – Projection weights dimension in multi-head attention. Obtained from transformer config
rotary_percent (float) – Percent of rotary dimension to use for rotary position embeddings.
rotary_interleaved (bool, optional) – If True, interleaved rotary position embeddings. Defaults to False.
seq_len_interpolation_factor (float, optional) – scale of linearly interpolating RoPE for longer sequences. The value must be a float larger than 1.0. Defaults to None
rotary_base (int, optional) – Base period for rotary position embeddings. Defaults to 10000.
rope_scaling (bool, optional) – Apply rope scaling as used in llama 3.x.
rope_scaling_factor (float, optional) – rope scaling factor in llama 3.x. Defaults to 8.
use_cpu_initialization (bool, optional) – If False, initialize the inv_freq directly on the GPU. Defaults to False
cp_group (torch.distributed.ProcessGroup, optional) – Process group for context parallel. Defaults to None.

Initialization

_apply_scaling( freqs, factor=8, low_freq_factor=1, high_freq_factor=4, original_max_position_embeddings=8192, )#

get_freqs_non_repeated( max_seq_len: int, offset: int = 0, ) → torch.Tensor#: Generates matrix of frequencies based on positions in the sequence, used to create positional encodings

get_cos_sin( max_seq_len: int, offset: int = 0, )#: Cosine and sine values for RoPE are precomputed for all positions up to the maximum sequence length

get_emb(max_seq_len: int, offset: int = 0) → torch.Tensor#

Forward pass of RoPE embedding before CP sharding.

Parameters:

max_seq_len (int) – Maximum size of sequence
offset (int, optional) – RoPE offset. Defaults to 0.

Returns:

Embeddings after applying RoPE.

Return type:

Tensor

forward( max_seq_len: int, offset: int = 0, packed_seq: bool = False, cp_group: Optional[torch.distributed.ProcessGroup] = None, ) → torch.Tensor#

Forward pass of RoPE embedding.

Parameters:

max_seq_len (int) – Maximum size of sequence
offset (int, optional) – RoPE offset. Defaults to 0.
packed_seq (bool, optional) – Whether to use packed sequence. Defaults to False.
cp_group (torch.distributed.ProcessGroup, optional) – Context parallel group. Defaults to None.

Returns:

Embeddings after applying RoPE.

Return type:

Tensor

_load_from_state_dict(state_dict, prefix, *args, **kwargs)#

get_rotary_seq_len( inference_context: megatron.core.inference.contexts.BaseInferenceContext, transformer: megatron.core.transformer.transformer_block.TransformerBlock, transformer_input: torch.Tensor, transformer_config: megatron.core.transformer.transformer_config.TransformerConfig, packed_seq_params: Optional[megatron.core.packed_seq_params.PackedSeqParams] = None, *, inference_params: Optional[megatron.core.inference.contexts.BaseInferenceContext] = None, ) → int#

Function to get the rotary sequence length.

Parameters:

inference_context – Used during Inference time
transformer (TransformerBlock) – The transformer block (decoder/encoder) used by the model
transformer_input (Tensor) – Input tensor to the transformer
transformer_config (TransformerConfig) – Transformer config used by the model
packed_seq_params (PackedSeqParams) – Packed sequence params

Returns:

The rotary sequence length

Return type:

int

class core.models.common.embeddings.rotary_pos_embedding.MultimodalRotaryEmbedding( kv_channels: int, rotary_percent: float, rotary_interleaved: bool = False, seq_len_interpolation_factor: Optional[float] = None, rotary_base: int = 10000, cp_group: Optional[torch.distributed.ProcessGroup] = None, )#

Bases: torch.nn.Module

Multimodal Rotary Embedding for language model. Based on https://github.com/alibaba/Pai-Megatron-Patch/blob/ efa5a752e845267936db9ae7df1b6aba92e9ff9a/megatron_patch/model/qwen2_vl/rotary_pos_embedding.py Copyright (c) 2025 alibaba/Pai-Megatron-Patch. Apache 2.0 license.

Parameters:

kv_channels (int) – Projection weights dimension in multi-head attention. Obtained from transformer config
rotary_percent (float) – Percent of rotary dimension to use for rotary position embeddings.
rotary_interleaved (bool, optional) – If True, interleaved rotary position embeddings. Defaults to False.
seq_len_interpolation_factor (float, optional) – scale of linearly interpolating RoPE for longer sequences. The value must be a float larger than 1.0. Defaults to None
rotary_base (int, optional) – Base period for rotary position embeddings. Defaults to 10000.

Initialization

forward( position_ids: torch.Tensor, mrope_section: List[int], cp_group: Optional[torch.distributed.ProcessGroup] = None, ) → torch.Tensor#

Forward pass of multimodal RoPE embedding.

Parameters:

position_ids (torch.Tensor) – A postion_id tensor with shape [3, batchsize, seqlens]
mrope_section (list[int]) – Multimodal rope section is for channel dimension of temporal, height and width in rope calculation.
cp_group (torch.distributed.ProcessGroup, optional) – Context parallel group. Defaults to None.

Returns:

Embeddings after applying RoPE.

Return type:

Tensor

core.models.common.embeddings.rotary_pos_embedding#

Module Contents#

Classes#

Data#

API#

`core.models.common.embeddings.rotary_pos_embedding`#