Rotary Position Embedding#
TensorRT includes built-in support for RoPE (Rotary Position Embedding) for transformers to make it easier to express RoPE and convert ONNX models with the IRotaryEmbeddingLayer (C++, Python) API to TensorRT.
The IRotaryEmbeddingLayer has three inputs, one optional input, and two attributes:
Inputs
(index 0) input: The input activation tensor with shape[B, N, S, H](index 1) cosCache: The cosine values for calculating the rotary embedding(index 2) sinCache: The sine values for calculating the rotary embedding
cosCache and sinCache should have the shape [B, S, H / 2].
Optional input
(index 3) positionIds: Position IDs for indexing intocosCacheandsinCache
positionIds should have shape [B, S]. When positionIds is provided, cosCache and sinCache should correspondingly have shape [maxPositionId + 1, H / 2].
Attributes
interleaved: A boolean that specifies whether the input tensor is ininterleavedformat, that is, whether the 2d vectors rotated are taken from adjacent 2 elements in the hidden dimension.rotaryEmbeddingDim: An integer specifying the hidden dimension that participates in RoPE. A special value of 0 means the full hidden dimension participates in RoPE. If it is not0, then the last dimension ofcosCacheandsinCacheshould correspondingly berotaryEmbeddingDim / 2instead ofH / 2.
The IRotaryEmbeddingLayer has one output, which is the output activation tensor. It has the same shape and format as (index 0) input: the input activation tensor with shape.
The IRotaryEmbeddingLayer is supported by all SM versions.