`nemo_automodel.shared.te_patches`#

Transformer Engine compatibility patches.

Runtime monkey-patches applied directly to TE classes in memory so they take effect immediately in the current process.

Call apply_te_patches() early in the process, before TE optimizers are instantiated.

Module Contents#

Functions#

`_apply_fused_adam_quantized_tensor_patch`	Patch FusedAdam._initialize_state to handle QuantizedTensor params.
`apply_te_patches`	Apply all Transformer Engine runtime patches.

Data#

`_logger`
`_TE_PATCHES_APPLIED`

API#

nemo_automodel.shared.te_patches._logger#: ‘getLogger(…)’

nemo_automodel.shared.te_patches._TE_PATCHES_APPLIED#: False

nemo_automodel.shared.te_patches._apply_fused_adam_quantized_tensor_patch() → None#

Patch FusedAdam._initialize_state to handle QuantizedTensor params.

TE’s FusedAdam uses torch.zeros(param.shape, ...) / torch.empty(param.shape, ...) in _initialize_state, which fails for QuantizedTensor parameters because their .shape does not carry the correct metadata for allocation. The fix dequantizes the param first and uses torch.zeros_like / torch.empty_like instead.

The fix was merged upstream in TE 2.12 via https://github.com/NVIDIA/TransformerEngine/pull/2535.

nemo_automodel.shared.te_patches.apply_te_patches() → None#

Apply all Transformer Engine runtime patches.

This function is idempotent and safe to call multiple times.

nemo_automodel.shared.te_patches#

Module Contents#

Functions#

Data#

API#

`nemo_automodel.shared.te_patches`#