nemo_automodel.shared.te_patches#
Transformer Engine compatibility patches.
Runtime monkey-patches applied directly to TE classes in memory so they take effect immediately in the current process.
Call apply_te_patches() early in the process, before TE optimizers are
instantiated.
Module Contents#
Functions#
Patch FusedAdam._initialize_state to handle QuantizedTensor params. |
|
Apply all Transformer Engine runtime patches. |
Data#
API#
- nemo_automodel.shared.te_patches._logger#
‘getLogger(…)’
- nemo_automodel.shared.te_patches._TE_PATCHES_APPLIED#
False
- nemo_automodel.shared.te_patches._apply_fused_adam_quantized_tensor_patch() → None#
Patch FusedAdam._initialize_state to handle QuantizedTensor params.
TE’s FusedAdam uses
torch.zeros(param.shape, ...)/torch.empty(param.shape, ...)in_initialize_state, which fails for QuantizedTensor parameters because their.shapedoes not carry the correct metadata for allocation. The fix dequantizes the param first and usestorch.zeros_like/torch.empty_likeinstead.The fix was merged upstream in TE 2.12 via https://github.com/NVIDIA/TransformerEngine/pull/2535.
- nemo_automodel.shared.te_patches.apply_te_patches() → None#
Apply all Transformer Engine runtime patches.
This function is idempotent and safe to call multiple times.