> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.models.llama.rope_utils

Rotary Position Embedding utilities for Llama and Qwen2 models.

This module provides RoPE implementation following HuggingFace's architecture.

Supports both:

* LlamaConfig: uses config.rope\_theta and config.rope\_scaling
* Qwen2Config: uses config.rope\_parameters\["rope\_theta"] and config.rope\_parameters

Note: gpt\_oss and deepseek\_v3 have their own specialized rope\_utils.py
with model-specific optimizations (YaRN, MLA, etc.).

## Module Contents

### Classes

| Name                                                                                              | Description                                                  |
| ------------------------------------------------------------------------------------------------- | ------------------------------------------------------------ |
| [`LlamaRotaryEmbedding`](#nemo_automodel-components-models-llama-rope_utils-LlamaRotaryEmbedding) | Rotary Position Embedding module for Llama and Qwen2 models. |

### Functions

| Name                                                                                                          | Description                                                                   |
| ------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------- |
| [`_compute_default_inv_freq`](#nemo_automodel-components-models-llama-rope_utils-_compute_default_inv_freq)   | Computes inverse frequencies for standard RoPE.                               |
| [`_compute_llama3_inv_freq`](#nemo_automodel-components-models-llama-rope_utils-_compute_llama3_inv_freq)     | Computes inverse frequencies for Llama3-style RoPE with smooth interpolation. |
| [`_get_rope_config`](#nemo_automodel-components-models-llama-rope_utils-_get_rope_config)                     | Extract rope parameters from config (handles both Llama and Qwen2 formats).   |
| [`apply_rotary_pos_emb`](#nemo_automodel-components-models-llama-rope_utils-apply_rotary_pos_emb)             | Applies Rotary Position Embedding to the query and key tensors.               |
| [`apply_rotary_pos_emb_fused`](#nemo_automodel-components-models-llama-rope_utils-apply_rotary_pos_emb_fused) | Applies RoPE using TE's fused kernel.                                         |
| [`rotate_half`](#nemo_automodel-components-models-llama-rope_utils-rotate_half)                               | Rotates half the hidden dims of the input.                                    |

### Data

[`Qwen2RotaryEmbedding`](#nemo_automodel-components-models-llama-rope_utils-Qwen2RotaryEmbedding)

[`RotaryEmbedding`](#nemo_automodel-components-models-llama-rope_utils-RotaryEmbedding)

[`__all__`](#nemo_automodel-components-models-llama-rope_utils-__all__)

### API

```python
class nemo_automodel.components.models.llama.rope_utils.LlamaRotaryEmbedding(
    config,
    device: typing.Optional[torch.device] = None,
    rope_fusion: bool = False
)
```

**Bases:** `Module`

Rotary Position Embedding module for Llama and Qwen2 models.

Returns (cos, sin) tuple for use with apply\_rotary\_pos\_emb.

```python
nemo_automodel.components.models.llama.rope_utils.LlamaRotaryEmbedding._build_cache(
    seq_len: int,
    device: torch.device
) -> None
```

Build cos/sin cache in config dtype for positions \[0, seq\_len).

```python
nemo_automodel.components.models.llama.rope_utils.LlamaRotaryEmbedding._ensure_cache(
    seq_len: int,
    device: torch.device
) -> None
```

Build or grow the cos/sin cache so it covers positions `[0, seq_len)`.

```python
nemo_automodel.components.models.llama.rope_utils.LlamaRotaryEmbedding.forward(
    x: torch.Tensor,
    position_ids: torch.Tensor
) -> tuple[torch.Tensor, torch.Tensor]
```

Return (cos, sin) for the given positions.

In the non-fused path `cos` / `sin` are gathered by the *values* in
`position_ids`, so non-contiguous positions receive the correct rotary
phase: EAGLE TTT depth offsets (`arange(seq_len) + step_idx`), packed
sequences, and context parallelism all pass
`position_ids != arange(seq_len)`. The earlier implementation returned
`cos_cache[:seq_len]`, which keyed only on the sequence *length* and
silently ignored the position values. For the common
`position_ids == arange(seq_len)` case the gather is numerically
identical to that slice.

The fused TE path (`rope_fusion=True`) consumes raw angles indexed by
sequence position and assumes contiguous `[0, seq_len)` positions, so
it keeps the legacy contiguous slice and does NOT honor non-contiguous
`position_ids`.

**Parameters:**

Input tensor (used for device and dtype)

Position IDs tensor \[batch, seq\_len]

**Returns:** `tuple[torch.Tensor, torch.Tensor]`

(cos, sin) tensors \[batch, seq\_len, head\_dim]

```python
nemo_automodel.components.models.llama.rope_utils._compute_default_inv_freq(
    config,
    device: typing.Optional[torch.device] = None
) -> tuple[torch.Tensor, float]
```

Computes inverse frequencies for standard RoPE.

```python
nemo_automodel.components.models.llama.rope_utils._compute_llama3_inv_freq(
    config,
    device: typing.Optional[torch.device] = None
) -> tuple[torch.Tensor, float]
```

Computes inverse frequencies for Llama3-style RoPE with smooth interpolation.

Branch logic (matches HF \_compute\_llama3\_parameters):

* Long wavelength  (low freq,  wavelen > low\_freq\_wavelen)  → scale by factor
* Short wavelength (high freq, wavelen \< high\_freq\_wavelen) → unchanged
* Medium band → smooth interpolation

```python
nemo_automodel.components.models.llama.rope_utils._get_rope_config(
    config
) -> tuple[float, dict]
```

Extract rope parameters from config (handles both Llama and Qwen2 formats).

**Returns:** `tuple[float, dict]`

Tuple of (rope\_theta, rope\_scaling\_dict)

```python
nemo_automodel.components.models.llama.rope_utils.apply_rotary_pos_emb(
    q: torch.Tensor,
    k: torch.Tensor,
    cos: torch.Tensor,
    sin: torch.Tensor
) -> tuple[torch.Tensor, torch.Tensor]
```

Applies Rotary Position Embedding to the query and key tensors.

**Parameters:**

Query tensor \[batch, num\_heads, seq\_len, head\_dim]

Key tensor \[batch, num\_kv\_heads, seq\_len, head\_dim]

Cosine embeddings \[batch, seq\_len, head\_dim]

Sine embeddings \[batch, seq\_len, head\_dim]

**Returns:** `tuple[torch.Tensor, torch.Tensor]`

Rotated (q, k) tensors

```python
nemo_automodel.components.models.llama.rope_utils.apply_rotary_pos_emb_fused(
    q: torch.Tensor,
    k: torch.Tensor,
    freqs_cis: torch.Tensor
) -> tuple[torch.Tensor, torch.Tensor]
```

Applies RoPE using TE's fused kernel.

**Parameters:**

Query tensor \[batch, num\_heads, seq\_len, head\_dim]

Key tensor \[batch, num\_kv\_heads, seq\_len, head\_dim]

Raw angles \[seq\_len, 1, 1, head\_dim] in TE format

**Returns:** `tuple[torch.Tensor, torch.Tensor]`

Rotated (q, k) tensors

```python
nemo_automodel.components.models.llama.rope_utils.rotate_half(
    x: torch.Tensor
) -> torch.Tensor
```

Rotates half the hidden dims of the input.

```python
nemo_automodel.components.models.llama.rope_utils.Qwen2RotaryEmbedding = LlamaRotaryEmbedding
```

```python
nemo_automodel.components.models.llama.rope_utils.RotaryEmbedding = LlamaRotaryEmbedding
```

```python
nemo_automodel.components.models.llama.rope_utils.__all__ = ['RotaryEmbedding', 'LlamaRotaryEmbedding', 'Qwen2RotaryEmbedding', 'rotate_half...
```