nemo_automodel._transformers.kernel_patches#

Kernel and attention patching utilities.

Functions for SDPA, Liger-kernel, and attention-implementation overrides. These are stateless helpers used during model construction.

Module Contents#

Functions#

_assert_same_signature

Raise AssertionError if the two call signatures differ.

_patch_attention

Wrap the forward method of obj in an sdap_kernel context manager.

_patch_liger_kernel

Patches a model with liger-kernel and sdpa_kernel

_get_next_fallback_attn

Get the next attention implementation in the priority list, in reverse order.

_apply_preload_overrides

Compute final attention implementation and liger-kernel flag based on TP/CP and packed sequence constraints.

_verify_sdpa_support

Validate SDPA support when CP is enabled for HF models.

Data#

API#

nemo_automodel._transformers.kernel_patches.DEFAULT_ATTN_IMPLEMENTATION#

None

nemo_automodel._transformers.kernel_patches.logger#

β€˜getLogger(…)’

nemo_automodel._transformers.kernel_patches._assert_same_signature(original, patched)#

Raise AssertionError if the two call signatures differ.

nemo_automodel._transformers.kernel_patches._patch_attention(obj, sdpa_method=None)#

Wrap the forward method of obj in an sdap_kernel context manager.

Parameters:
  • obj – Any object with a .forward(*args, **kwargs) method.

  • sdpa_method (list[SDPBackend], optional) – Ordered list of SDPBackend implementations to attempt. If None, defaults to [CUDNN_ATTENTION, FLASH_ATTENTION, EFFICIENT_ATTENTION, MATH].

Returns:

The same obj with its .forward method patched.

nemo_automodel._transformers.kernel_patches._patch_liger_kernel(model)#

Patches a model with liger-kernel and sdpa_kernel

Parameters:
  • model (nn.Module) – the model to patch

  • use_liger_kernel (bool) – Applies liger-kernel to model Default True.

  • use_sdpa_patching (bool) – Enables model patching with SDPA kernel optim. Default True.

  • sdpa_method (list[SDPBackend], optional) – Ordered list of SDPBackend implementations to attempt. If None, defaults to [CUDNN_ATTENTION, FLASH_ATTENTION, EFFICIENT_ATTENTION, MATH].

Returns:

the patched model

Return type:

nn.Module

nemo_automodel._transformers.kernel_patches._get_next_fallback_attn(attn_implementation: str) str#

Get the next attention implementation in the priority list, in reverse order.

If a model does not support a given attention implementation, the next implementation in the priority list is returned.

If the current attention implementation is not in the priority list, it uses eager.

Parameters:

attn_implementation (str) – The current attention implementation.

Returns:

The next attention implementation in the priority list.

Return type:

str

nemo_automodel._transformers.kernel_patches._apply_preload_overrides(
tp_size,
cp_size,
has_packed_sequence,
attn_implementation,
use_liger_kernel,
)#

Compute final attention implementation and liger-kernel flag based on TP/CP and packed sequence constraints.

nemo_automodel._transformers.kernel_patches._verify_sdpa_support(model, cp_size)#

Validate SDPA support when CP is enabled for HF models.