core.models.common.embeddings.yarn_rotary_pos_embedding#

Module Contents#

Classes#

YarnRotaryEmbedding

Yarn Rotary Embedding for language model.

Functions#

_yarn_find_correction_dim

_yarn_find_correction_range

_yarn_linear_ramp_mask

_yarn_get_mscale

_yarn_get_concentration_factor

Get the concentration factor (factor multiplied to the sine and cosine components of the embedding). This factor is also known as attention factor, and sometimes homonymously known as “mscale”

_yarn_get_concentration_factor_from_config

Data#

API#

core.models.common.embeddings.yarn_rotary_pos_embedding.logger#

‘getLogger(…)’

class core.models.common.embeddings.yarn_rotary_pos_embedding.YarnRotaryEmbedding(
kv_channels: int,
rotary_percent: float = 1.0,
rotary_interleaved: bool = False,
seq_len_interpolation_factor: Optional[float] = None,
rotary_base: float = 10000.0,
use_cpu_initialization: bool = False,
scaling_factor: float = 1.0,
original_max_position_embeddings: int = 4096,
beta_fast: float = 32.0,
beta_slow: float = 1.0,
mscale: float = 1.0,
mscale_all_dim: float = 0.0,
correction_range_round_to_int: bool = True,
cp_group: Optional[torch.distributed.ProcessGroup] = None,
)#

Bases: megatron.core.models.common.embeddings.rotary_pos_embedding.RotaryEmbedding

Yarn Rotary Embedding for language model.

Parameters:
  • kv_channels (int) – Projection weights dimension in multi-head attention. Obtained from transformer config.

  • rotary_percent (float) – Percent of rotary dimension to use for rotary position embeddings.

  • rotary_interleaved (bool, optional) – If True, interleaved rotary position embeddings. Defaults to False.

  • seq_len_interpolation_factor (float, optional) – scale of linearly interpolating RoPE for longer sequences. The value must be a float larger than 1.0. Defaults to None

  • rotary_base (float, optional) – Base period for rotary position embeddings. Defaults to 10000.

  • use_cpu_initialization (bool, optional) – If False, initialize the inv_freq directly on the GPU. Defaults to False.

  • scaling_factor (float, optional) – Scaling factor for Yarn RoPE. Defaults to 1.0.

  • original_max_position_embeddings (int, optional) – Original maximum position embeddings length. Defaults to 4096.

  • beta_fast (float, optional) – Fast beta value for Yarn RoPE. Defaults to 32.

  • beta_slow (float, optional) – Slow beta value for Yarn RoPE. Defaults to 1.

  • mscale (float, optional) – Mscale value for Yarn RoPE. Defaults to 1.

  • mscale_all_dim (float, optional) – Mscale all dim value for Yarn RoPE. Defaults to 0.

  • correction_range_round_to_int (bool) – Whether to round dim range bounds to integer. Defaults to True

  • cp_group (torch.distributed.ProcessGroup, optional) – Process group for context parallel. Defaults to None.

Initialization

forward(
max_seq_len: int,
offset: int = 0,
packed_seq: bool = False,
) torch.Tensor#

Forward pass of Yarn Rotary Embedding.

Parameters:
  • max_seq_len (int) – Maximum size of sequence

  • offset (int, optional) – RoPE offset. Defaults to 0.

  • packed_seq (bool, optional) – Whether to use packed sequence. Defaults to False.

Returns:

Embeddings after applying Yarn RoPE.

Return type:

Tensor

_set_cos_sin_cache(seq_len, offset, dtype, packed_seq=False)#
get_cached_cos_sin(
seq_len,
offset=0,
dtype=torch.get_default_dtype(),
packed_seq=False,
)#

Get cached cos and sin values.

core.models.common.embeddings.yarn_rotary_pos_embedding._yarn_find_correction_dim(
num_rotations: float,
dim: int,
rotary_base: float = 10000,
max_position_embeddings: int = 2048,
) float#
core.models.common.embeddings.yarn_rotary_pos_embedding._yarn_find_correction_range(
low_rot: float,
high_rot: float,
dim: int,
rotary_base: float = 10000,
max_position_embeddings: int = 2048,
round_to_int: bool = True,
) tuple[int, int]#
core.models.common.embeddings.yarn_rotary_pos_embedding._yarn_linear_ramp_mask(
min: float,
max: float,
dim: int,
device: torch.device,
) torch.Tensor#
core.models.common.embeddings.yarn_rotary_pos_embedding._yarn_get_mscale(scale: float = 1, mscale: float = 1) float#
core.models.common.embeddings.yarn_rotary_pos_embedding._yarn_get_concentration_factor(
scaling_factor: float,
mscale: Optional[float],
mscale_all_dim: Optional[float],
) float#

Get the concentration factor (factor multiplied to the sine and cosine components of the embedding). This factor is also known as attention factor, and sometimes homonymously known as “mscale”

core.models.common.embeddings.yarn_rotary_pos_embedding._yarn_get_concentration_factor_from_config(
config: megatron.core.transformer.TransformerConfig,
) float#