Working with Transformers#
TensorRT provides built-in layers and fusions for common transformer workloads, including RoPE, KV cache updates, MoE, fused attention, and multi-device attention. The pages below cover each topic in depth.
TensorRT provides built-in layers and fusions for common transformer workloads, including RoPE, KV cache updates, MoE, fused attention, and multi-device attention. The pages below cover each topic in depth.