Rotary Position Embedding#

TensorRT includes built-in support for RoPE (Rotary Position Embedding) for transformers to make it easier to express RoPE and convert ONNX models with the IRotaryEmbeddingLayer (C++, Python) API to TensorRT.

The IRotaryEmbeddingLayer has three inputs, one optional input, and two attributes:

Inputs

  • (index 0) input: The input activation tensor with shape [B, N, S, H]

  • (index 1) cosCache: The cosine values for calculating the rotary embedding

  • (index 2) sinCache: The sine values for calculating the rotary embedding

cosCache and sinCache should have the shape [B, S, H / 2].

Optional input

  • (index 3) positionIds: Position IDs for indexing into cosCache and sinCache

positionIds should have shape [B, S]. When positionIds is provided, cosCache and sinCache should correspondingly have shape [maxPositionId + 1, H / 2].

Attributes

  • interleaved: A boolean that specifies whether the input tensor is in interleaved format, that is, whether the 2d vectors rotated are taken from adjacent 2 elements in the hidden dimension.

  • rotaryEmbeddingDim: An integer specifying the hidden dimension that participates in RoPE. A special value of 0 means the full hidden dimension participates in RoPE. If it is not 0, then the last dimension of cosCache and sinCache should correspondingly be rotaryEmbeddingDim / 2 instead of H / 2.

The IRotaryEmbeddingLayer has one output, which is the output activation tensor. It has the same shape and format as (index 0) input: the input activation tensor with shape.

The IRotaryEmbeddingLayer is supported by all SM versions.