Transformer Engine
0.11.0-3f01b4f
Version select:
  • Home

Getting Started

  • Installation
    • Prerequisites
    • Transformer Engine in NGC Containers
    • pip - from GitHub
      • Additional Prerequisites
      • Installation (stable release)
      • Installation (development build)
      • Installation (from source)
  • Getting Started
    • Overview
    • Let’s build a Transformer layer!
    • Meet Transformer Engine
    • Fused TE Modules
    • Enabling FP8

Python API documentation

  • Common API
    • Format
    • DelayedScaling
  • Framework-specific API
    • pyTorch
      • Linear
        • forward
      • LayerNorm
      • RMSNorm
      • LayerNormLinear
        • forward
      • LayerNormMLP
        • forward
      • DotProductAttention
        • forward
      • TransformerLayer
        • forward
      • fp8_autocast
      • checkpoint
      • onnx_export
    • Jax
      • MajorShardingType
      • ShardingType
      • TransformerLayerType
      • ShardingResource
      • fp8_autocast
      • update_collections
      • update_fp8_metas
      • LayerNorm
        • __call__
      • DenseGeneral
        • __call__
      • LayerNormDenseGeneral
        • __call__
      • LayerNormMLP
        • __call__
      • RelativePositionBiases
        • __call__
      • MultiHeadAttention
        • __call__
      • TransformerLayer
        • __call__
      • extend_logical_axis_rules

Examples and Tutorials

  • Using FP8 with Transformer Engine
    • Introduction to FP8
      • Structure
      • Mixed precision training - a quick introduction
      • Mixed precision training with FP8
    • Using FP8 with Transformer Engine
      • FP8 recipe
      • FP8 autocasting
      • Handling backward pass
      • Precision
  • Performance Optimizations
    • Multi-GPU training
    • Gradient accumulation fusion
    • FP8 weight caching

Advanced

  • C/C++ API
    • activation.h
      • void nvte_gelu
      • void nvte_dgelu
      • void nvte_geglu
      • void nvte_dgeglu
      • void nvte_relu
      • void nvte_drelu
      • void nvte_swiglu
      • void nvte_dswiglu
      • void nvte_reglu
      • void nvte_dreglu
    • cast.h
      • void nvte_fp8_quantize
      • void nvte_fp8_dequantize
    • gemm.h
      • void nvte_cublas_gemm
    • fused_attn.h
      • enum NVTE_QKV_Layout
        • enumerator NVTE_NOT_INTERLEAVED
        • enumerator NVTE_QKV_INTERLEAVED
        • enumerator NVTE_KV_INTERLEAVED
      • enum NVTE_Bias_Type
        • enumerator NVTE_NO_BIAS
        • enumerator NVTE_PRE_SCALE_BIAS
        • enumerator NVTE_POST_SCALE_BIAS
      • enum NVTE_Mask_Type
        • enumerator NVTE_NO_MASK
        • enumerator NVTE_PADDING_MASK
        • enumerator NVTE_CAUSAL_MASK
      • enum NVTE_Fused_Attn_Backend
        • enumerator NVTE_No_Backend
        • enumerator NVTE_F16_max512_seqlen
        • enumerator NVTE_F16_arbitrary_seqlen
        • enumerator NVTE_FP8
      • NVTE_Fused_Attn_Backend nvte_get_fused_attn_backend
      • void nvte_fused_attn_fwd_qkvpacked
      • void nvte_fused_attn_bwd_qkvpacked
      • void nvte_fused_attn_fwd_kvpacked
      • void nvte_fused_attn_bwd_kvpacked
    • layer_norm.h
      • void nvte_layernorm_fwd
      • void nvte_layernorm1p_fwd
      • void nvte_layernorm_bwd
      • void nvte_layernorm1p_bwd
    • rmsnorm.h
      • void nvte_rmsnorm_fwd
      • void nvte_rmsnorm_bwd
    • softmax.h
      • void nvte_scaled_softmax_forward
      • void nvte_scaled_softmax_backward
      • void nvte_scaled_masked_softmax_forward
      • void nvte_scaled_masked_softmax_backward
      • void nvte_scaled_upper_triang_masked_softmax_forward
      • void nvte_scaled_upper_triang_masked_softmax_backward
    • transformer_engine.h
      • typedef void *NVTETensor
      • enum NVTEDType
        • enumerator kNVTEByte
        • enumerator kNVTEInt32
        • enumerator kNVTEInt64
        • enumerator kNVTEFloat32
        • enumerator kNVTEFloat16
        • enumerator kNVTEBFloat16
        • enumerator kNVTEFloat8E4M3
        • enumerator kNVTEFloat8E5M2
        • enumerator kNVTENumTypes
      • NVTETensor nvte_create_tensor
      • void nvte_destroy_tensor
      • NVTEDType nvte_tensor_type
      • NVTEShape nvte_tensor_shape
      • void *nvte_tensor_data
      • float *nvte_tensor_amax
      • float *nvte_tensor_scale
      • float *nvte_tensor_scale_inv
      • void nvte_tensor_pack_create
      • void nvte_tensor_pack_destroy
      • struct NVTEShape
        • const size_t *data
        • size_t ndim
      • struct NVTETensorPack
        • NVTETensor tensors[MAX_SIZE]
        • size_t size = 0
        • static const int MAX_SIZE = 10
      • namespace transformer_engine
        • enum class DType
        • struct TensorWrapper
    • transpose.h
      • void nvte_cast_transpose
      • void nvte_transpose
      • void nvte_cast_transpose_dbias
      • void nvte_fp8_transpose_dbias
      • void nvte_cast_transpose_dbias_dgelu
      • void nvte_multi_cast_transpose
      • void nvte_dgeglu_cast_transpose
Transformer Engine
  • »
  • Search


© Copyright 2022-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved..

Built with Sphinx using a theme provided by Read the Docs.