`nemo_automodel._transformers.kernel_patches`#

Kernel and attention patching utilities.

Functions for SDPA, Liger-kernel, and attention-implementation overrides. These are stateless helpers used during model construction.

Module Contents#

`_assert_same_signature`	Raise AssertionError if the two call signatures differ.
`_patch_attention`	Wrap the `forward` method of `obj` in an `sdap_kernel` context manager.
`_patch_liger_kernel`	Patches a model with liger-kernel and sdpa_kernel
`_get_next_fallback_attn`	Get the next attention implementation in the priority list, in reverse order.
`_apply_preload_overrides`	Compute final attention implementation and liger-kernel flag based on TP/CP and packed sequence constraints.
`_verify_sdpa_support`	Validate SDPA support when CP is enabled for HF models.

`DEFAULT_ATTN_IMPLEMENTATION`
`logger`

nemo_automodel._transformers.kernel_patches.DEFAULT_ATTN_IMPLEMENTATION#: None

nemo_automodel._transformers.kernel_patches._assert_same_signature(original, patched)#: Raise AssertionError if the two call signatures differ.

nemo_automodel._transformers.kernel_patches._patch_attention(obj, sdpa_method=None)#

Wrap the forward method of obj in an sdap_kernel context manager.

Parameters:

obj – Any object with a .forward(*args, **kwargs) method.
sdpa_method (list[SDPBackend], optional) – Ordered list of SDPBackend implementations to attempt. If None, defaults to [CUDNN_ATTENTION, FLASH_ATTENTION, EFFICIENT_ATTENTION, MATH].

Returns:

The same obj with its .forward method patched.

nemo_automodel._transformers.kernel_patches._patch_liger_kernel(model)#

Patches a model with liger-kernel and sdpa_kernel

Parameters:

model (nn.Module) – the model to patch
use_liger_kernel (bool) – Applies liger-kernel to model Default True.
use_sdpa_patching (bool) – Enables model patching with SDPA kernel optim. Default True.
sdpa_method (list[SDPBackend], optional) – Ordered list of SDPBackend implementations to attempt. If None, defaults to [CUDNN_ATTENTION, FLASH_ATTENTION, EFFICIENT_ATTENTION, MATH].

Returns:

the patched model

Return type:

nn.Module

nemo_automodel._transformers.kernel_patches._get_next_fallback_attn(attn_implementation: str) → str#

Get the next attention implementation in the priority list, in reverse order.

If a model does not support a given attention implementation, the next implementation in the priority list is returned.

If the current attention implementation is not in the priority list, it uses eager.

Parameters:: attn_implementation (str) – The current attention implementation.
Returns:: The next attention implementation in the priority list.
Return type:: str

nemo_automodel._transformers.kernel_patches._apply_preload_overrides( tp_size, cp_size, has_packed_sequence, attn_implementation, use_liger_kernel, )#: Compute final attention implementation and liger-kernel flag based on TP/CP and packed sequence constraints.

nemo_automodel._transformers.kernel_patches._verify_sdpa_support(model, cp_size)#: Validate SDPA support when CP is enabled for HF models.