Is this page helpful?

Working with Transformers#

TensorRT provides built-in layers and fusions for common transformer workloads, including RoPE, KV cache updates, MoE, fused attention, and multi-device attention. The pages below cover each topic in depth.