core.models.common.embeddings.yarn_rotary_pos_embedding#
Module Contents#
Classes#
Yarn Rotary Embedding for language model. |
Functions#
Get the concentration factor (factor multiplied to the sine and cosine components of the embedding). This factor is also known as attention factor, and sometimes homonymously known as “mscale” |
|
Data#
API#
- core.models.common.embeddings.yarn_rotary_pos_embedding.logger#
‘getLogger(…)’
- class core.models.common.embeddings.yarn_rotary_pos_embedding.YarnRotaryEmbedding(
- kv_channels: int,
- rotary_percent: float = 1.0,
- rotary_interleaved: bool = False,
- seq_len_interpolation_factor: Optional[float] = None,
- rotary_base: float = 10000.0,
- use_cpu_initialization: bool = False,
- scaling_factor: float = 1.0,
- original_max_position_embeddings: int = 4096,
- beta_fast: float = 32.0,
- beta_slow: float = 1.0,
- mscale: float = 1.0,
- mscale_all_dim: float = 0.0,
- correction_range_round_to_int: bool = True,
- cp_group: Optional[torch.distributed.ProcessGroup] = None,
Bases:
megatron.core.models.common.embeddings.rotary_pos_embedding.RotaryEmbeddingYarn Rotary Embedding for language model.
- Parameters:
kv_channels (int) – Projection weights dimension in multi-head attention. Obtained from transformer config.
rotary_percent (float) – Percent of rotary dimension to use for rotary position embeddings.
rotary_interleaved (bool, optional) – If True, interleaved rotary position embeddings. Defaults to False.
seq_len_interpolation_factor (float, optional) – scale of linearly interpolating RoPE for longer sequences. The value must be a float larger than 1.0. Defaults to None
rotary_base (float, optional) – Base period for rotary position embeddings. Defaults to 10000.
use_cpu_initialization (bool, optional) – If False, initialize the inv_freq directly on the GPU. Defaults to False.
scaling_factor (float, optional) – Scaling factor for Yarn RoPE. Defaults to 1.0.
original_max_position_embeddings (int, optional) – Original maximum position embeddings length. Defaults to 4096.
beta_fast (float, optional) – Fast beta value for Yarn RoPE. Defaults to 32.
beta_slow (float, optional) – Slow beta value for Yarn RoPE. Defaults to 1.
mscale (float, optional) – Mscale value for Yarn RoPE. Defaults to 1.
mscale_all_dim (float, optional) – Mscale all dim value for Yarn RoPE. Defaults to 0.
correction_range_round_to_int (bool) – Whether to round dim range bounds to integer. Defaults to True
cp_group (torch.distributed.ProcessGroup, optional) – Process group for context parallel. Defaults to None.
Initialization
- forward(
- max_seq_len: int,
- offset: int = 0,
- packed_seq: bool = False,
Forward pass of Yarn Rotary Embedding.
- Parameters:
max_seq_len (int) – Maximum size of sequence
offset (int, optional) – RoPE offset. Defaults to 0.
packed_seq (bool, optional) – Whether to use packed sequence. Defaults to False.
- Returns:
Embeddings after applying Yarn RoPE.
- Return type:
Tensor
- _set_cos_sin_cache(seq_len, offset, dtype, packed_seq=False)#
- get_cached_cos_sin(
- seq_len,
- offset=0,
- dtype=torch.get_default_dtype(),
- packed_seq=False,
Get cached cos and sin values.
- core.models.common.embeddings.yarn_rotary_pos_embedding._yarn_find_correction_dim(
- num_rotations: float,
- dim: int,
- rotary_base: float = 10000,
- max_position_embeddings: int = 2048,
- core.models.common.embeddings.yarn_rotary_pos_embedding._yarn_find_correction_range(
- low_rot: float,
- high_rot: float,
- dim: int,
- rotary_base: float = 10000,
- max_position_embeddings: int = 2048,
- round_to_int: bool = True,
- core.models.common.embeddings.yarn_rotary_pos_embedding._yarn_linear_ramp_mask(
- min: float,
- max: float,
- dim: int,
- device: torch.device,
- core.models.common.embeddings.yarn_rotary_pos_embedding._yarn_get_mscale(scale: float = 1, mscale: float = 1) float#
- core.models.common.embeddings.yarn_rotary_pos_embedding._yarn_get_concentration_factor(
- scaling_factor: float,
- mscale: Optional[float],
- mscale_all_dim: Optional[float],
Get the concentration factor (factor multiplied to the sine and cosine components of the embedding). This factor is also known as attention factor, and sometimes homonymously known as “mscale”
- core.models.common.embeddings.yarn_rotary_pos_embedding._yarn_get_concentration_factor_from_config(
- config: megatron.core.transformer.TransformerConfig,