nemo_automodel._transformers.kernel_patches#
Kernel and attention patching utilities.
Functions for SDPA, Liger-kernel, and attention-implementation overrides. These are stateless helpers used during model construction.
Module Contents#
Functions#
Raise AssertionError if the two call signatures differ. |
|
Wrap the |
|
Patches a model with liger-kernel and sdpa_kernel |
|
Get the next attention implementation in the priority list, in reverse order. |
|
Compute final attention implementation and liger-kernel flag based on TP/CP and packed sequence constraints. |
|
Validate SDPA support when CP is enabled for HF models. |
Data#
API#
- nemo_automodel._transformers.kernel_patches.DEFAULT_ATTN_IMPLEMENTATION#
None
- nemo_automodel._transformers.kernel_patches.logger#
βgetLogger(β¦)β
- nemo_automodel._transformers.kernel_patches._assert_same_signature(original, patched)#
Raise AssertionError if the two call signatures differ.
- nemo_automodel._transformers.kernel_patches._patch_attention(obj, sdpa_method=None)#
Wrap the
forwardmethod ofobjin ansdap_kernelcontext manager.- Parameters:
obj β Any object with a
.forward(*args, **kwargs)method.sdpa_method (list[SDPBackend], optional) β Ordered list of SDPBackend implementations to attempt. If None, defaults to [CUDNN_ATTENTION, FLASH_ATTENTION, EFFICIENT_ATTENTION, MATH].
- Returns:
The same
objwith its.forwardmethod patched.
- nemo_automodel._transformers.kernel_patches._patch_liger_kernel(model)#
Patches a model with liger-kernel and sdpa_kernel
- Parameters:
model (nn.Module) β the model to patch
use_liger_kernel (bool) β Applies liger-kernel to model Default True.
use_sdpa_patching (bool) β Enables model patching with SDPA kernel optim. Default True.
sdpa_method (list[SDPBackend], optional) β Ordered list of SDPBackend implementations to attempt. If None, defaults to [CUDNN_ATTENTION, FLASH_ATTENTION, EFFICIENT_ATTENTION, MATH].
- Returns:
the patched model
- Return type:
nn.Module
- nemo_automodel._transformers.kernel_patches._get_next_fallback_attn(attn_implementation: str) str#
Get the next attention implementation in the priority list, in reverse order.
If a model does not support a given attention implementation, the next implementation in the priority list is returned.
If the current attention implementation is not in the priority list, it uses eager.
- Parameters:
attn_implementation (str) β The current attention implementation.
- Returns:
The next attention implementation in the priority list.
- Return type:
str
- nemo_automodel._transformers.kernel_patches._apply_preload_overrides(
- tp_size,
- cp_size,
- has_packed_sequence,
- attn_implementation,
- use_liger_kernel,
Compute final attention implementation and liger-kernel flag based on TP/CP and packed sequence constraints.
- nemo_automodel._transformers.kernel_patches._verify_sdpa_support(model, cp_size)#
Validate SDPA support when CP is enabled for HF models.