nemo_automodel.components.models.gpt_oss.rope_utils
nemo_automodel.components.models.gpt_oss.rope_utils
Module Contents
Classes
Functions
API
Bases: Module
See YaRN paper: https://arxiv.org/abs/2309.00071
Uses rotary_dim instead of head_dim to support partial rotary embeddings.
Apply rotary embeddings to input tensor.
If cos/sin have fewer dimensions than x (due to partial_rotary_factor < 1.0), only the first rotary_dim dimensions of x are rotated, and the rest are passed through.
Parameters:
Input tensor (…, head_dim)
Cosine tensor (…, rotary_dim // 2)
Sine tensor (…, rotary_dim // 2)
Apply rotary embeddings to query and key tensors.
Parameters:
Query tensor.
Key tensor.
Frequency tensor. Format depends on rope_fusion:
- If rope_fusion=True: [angles, angles] for TE fused rope
- If rope_fusion=False: [cos, sin] with concentration applied
QKV format (“bshd” or “thd”).
If True, use TE fused rope. If False, use non-fused rope.
Cumulative sequence lengths for variable-length sequences.
Context parallelism size.
Context parallelism rank.
Returns: tuple[torch.Tensor, torch.Tensor]
Tuple of (q, k) with rotary embeddings applied.