`nemo_automodel._transformers.kernel_patches`#

Kernel and attention patching utilities.

Functions for SDPA, Liger-kernel, and attention-implementation overrides. These are stateless helpers used during model construction.

Module Contents#

Functions#

`_assert_same_signature`	Raise AssertionError if the two call signatures differ.
`_patch_attention`	Wrap the `forward` method of `obj` in an `sdap_kernel` context manager.
`_patch_liger_kernel`	Patches a model with liger-kernel and sdpa_kernel
`_patch_legacy_flash_attn_flag`	Bridge the legacy `_supports_flash_attn_2` class flag to v5.5’s `_supports_flash_attn`.
`_get_next_fallback_attn`	Get the next attention implementation in the priority list, in reverse order.
`_apply_preload_overrides`	Compute final attention implementation and liger-kernel flag based on TP/CP and packed sequence constraints.
`_verify_sdpa_support`	Validate SDPA support when CP is enabled for HF models.

Data#

`HAS_LIGER_KERNEL`
`liger_kernel_trf`
`DEFAULT_ATTN_IMPLEMENTATION`
`logger`

API#

nemo_automodel._transformers.kernel_patches.HAS_LIGER_KERNEL#: None

nemo_automodel._transformers.kernel_patches.liger_kernel_trf#: None

nemo_automodel._transformers.kernel_patches.DEFAULT_ATTN_IMPLEMENTATION#: None

nemo_automodel._transformers.kernel_patches.logger#: ‘getLogger(…)’

nemo_automodel._transformers.kernel_patches._assert_same_signature(original, patched)#: Raise AssertionError if the two call signatures differ.

nemo_automodel._transformers.kernel_patches._patch_attention(obj, sdpa_method=None)#

Wrap the forward method of obj in an sdap_kernel context manager.

Parameters:

obj – Any object with a .forward(*args, **kwargs) method.
sdpa_method (list[SDPBackend], optional) – Ordered list of SDPBackend implementations to attempt. If None, defaults to [CUDNN_ATTENTION, FLASH_ATTENTION, EFFICIENT_ATTENTION, MATH].

Returns:

The same obj with its .forward method patched.

nemo_automodel._transformers.kernel_patches._patch_liger_kernel(model)#

Patches a model with liger-kernel and sdpa_kernel

Parameters:

model (nn.Module) – the model to patch
use_liger_kernel (bool) – Applies liger-kernel to model Default True.
use_sdpa_patching (bool) – Enables model patching with SDPA kernel optim. Default True.
sdpa_method (list[SDPBackend], optional) – Ordered list of SDPBackend implementations to attempt. If None, defaults to [CUDNN_ATTENTION, FLASH_ATTENTION, EFFICIENT_ATTENTION, MATH].

Returns:

the patched model

Return type:

nn.Module

nemo_automodel._transformers.kernel_patches._patch_legacy_flash_attn_flag()#

Bridge the legacy _supports_flash_attn_2 class flag to v5.5’s _supports_flash_attn.

transformers v5.5 renamed the FA2-support attribute from _supports_flash_attn_2 to _supports_flash_attn and switched the dispatch check at _flash_attn_can_dispatch to the new name only. Remote-code models pinned against <=v5.3 (e.g. microsoft/Phi-4-multimodal-instruct sets _supports_flash_attn_2 = True in its modeling file) are not aware of the rename, so their FA2 support is invisible to v5.5 and attn_implementation="flash_attention_2" raises ValueError.

Install a property on PreTrainedModel._supports_flash_attn that falls back to the legacy flag when a subclass has not set the new one. Subclasses that set _supports_flash_attn = True directly still shadow the property via normal MRO lookup, so native models are unaffected.

nemo_automodel._transformers.kernel_patches._get_next_fallback_attn(attn_implementation: str) → str#

Get the next attention implementation in the priority list, in reverse order.

If a model does not support a given attention implementation, the next implementation in the priority list is returned.

If the current attention implementation is not in the priority list, it uses eager.

Parameters:: attn_implementation (str) – The current attention implementation.
Returns:: The next attention implementation in the priority list.
Return type:: str

nemo_automodel._transformers.kernel_patches._apply_preload_overrides( tp_size, cp_size, has_packed_sequence, attn_implementation, use_liger_kernel, )#: Compute final attention implementation and liger-kernel flag based on TP/CP and packed sequence constraints.

nemo_automodel._transformers.kernel_patches._verify_sdpa_support(model, cp_size)#: Validate SDPA support when CP is enabled for HF models.

nemo_automodel._transformers.kernel_patches#

Module Contents#

Functions#

Data#

API#

`nemo_automodel._transformers.kernel_patches`#