nemo_automodel._transformers.kernel_patches#

Kernel and attention patching utilities.

Functions for SDPA, Liger-kernel, and attention-implementation overrides. These are stateless helpers used during model construction.

Module Contents#

Functions#

_assert_same_signature

Raise AssertionError if the two call signatures differ.

_patch_attention

Wrap the forward method of obj in an sdap_kernel context manager.

_patch_liger_kernel

Patches a model with liger-kernel and sdpa_kernel

_patch_legacy_flash_attn_flag

Bridge the legacy _supports_flash_attn_2 class flag to v5.5’s _supports_flash_attn.

_get_next_fallback_attn

Get the next attention implementation in the priority list, in reverse order.

_apply_preload_overrides

Compute final attention implementation and liger-kernel flag based on TP/CP and packed sequence constraints.

_verify_sdpa_support

Validate SDPA support when CP is enabled for HF models.

Data#

API#

nemo_automodel._transformers.kernel_patches.HAS_LIGER_KERNEL#

None

nemo_automodel._transformers.kernel_patches.liger_kernel_trf#

None

nemo_automodel._transformers.kernel_patches.DEFAULT_ATTN_IMPLEMENTATION#

None

nemo_automodel._transformers.kernel_patches.logger#

β€˜getLogger(…)’

nemo_automodel._transformers.kernel_patches._assert_same_signature(original, patched)#

Raise AssertionError if the two call signatures differ.

nemo_automodel._transformers.kernel_patches._patch_attention(obj, sdpa_method=None)#

Wrap the forward method of obj in an sdap_kernel context manager.

Parameters:
  • obj – Any object with a .forward(*args, **kwargs) method.

  • sdpa_method (list[SDPBackend], optional) – Ordered list of SDPBackend implementations to attempt. If None, defaults to [CUDNN_ATTENTION, FLASH_ATTENTION, EFFICIENT_ATTENTION, MATH].

Returns:

The same obj with its .forward method patched.

nemo_automodel._transformers.kernel_patches._patch_liger_kernel(model)#

Patches a model with liger-kernel and sdpa_kernel

Parameters:
  • model (nn.Module) – the model to patch

  • use_liger_kernel (bool) – Applies liger-kernel to model Default True.

  • use_sdpa_patching (bool) – Enables model patching with SDPA kernel optim. Default True.

  • sdpa_method (list[SDPBackend], optional) – Ordered list of SDPBackend implementations to attempt. If None, defaults to [CUDNN_ATTENTION, FLASH_ATTENTION, EFFICIENT_ATTENTION, MATH].

Returns:

the patched model

Return type:

nn.Module

nemo_automodel._transformers.kernel_patches._patch_legacy_flash_attn_flag()#

Bridge the legacy _supports_flash_attn_2 class flag to v5.5’s _supports_flash_attn.

transformers v5.5 renamed the FA2-support attribute from _supports_flash_attn_2 to _supports_flash_attn and switched the dispatch check at _flash_attn_can_dispatch to the new name only. Remote-code models pinned against <=v5.3 (e.g. microsoft/Phi-4-multimodal-instruct sets _supports_flash_attn_2 = True in its modeling file) are not aware of the rename, so their FA2 support is invisible to v5.5 and attn_implementation="flash_attention_2" raises ValueError.

Install a property on PreTrainedModel._supports_flash_attn that falls back to the legacy flag when a subclass has not set the new one. Subclasses that set _supports_flash_attn = True directly still shadow the property via normal MRO lookup, so native models are unaffected.

nemo_automodel._transformers.kernel_patches._get_next_fallback_attn(attn_implementation: str) str#

Get the next attention implementation in the priority list, in reverse order.

If a model does not support a given attention implementation, the next implementation in the priority list is returned.

If the current attention implementation is not in the priority list, it uses eager.

Parameters:

attn_implementation (str) – The current attention implementation.

Returns:

The next attention implementation in the priority list.

Return type:

str

nemo_automodel._transformers.kernel_patches._apply_preload_overrides(
tp_size,
cp_size,
has_packed_sequence,
attn_implementation,
use_liger_kernel,
)#

Compute final attention implementation and liger-kernel flag based on TP/CP and packed sequence constraints.

nemo_automodel._transformers.kernel_patches._verify_sdpa_support(model, cp_size)#

Validate SDPA support when CP is enabled for HF models.