nemo_automodel.components._transformers.auto_model#

Module Contents#

Classes#

_BaseNeMoAutoModelClass

Drop-in replacement for _BaseAutoModelClass that includes custom-kernels.

NeMoAutoModelForCausalLM

Drop-in replacement for transformers.AutoModelForCausalLM that includes custom-kernels.

NeMoAutoModelForImageTextToText

Drop-in replacement for transformers.AutoModelForImageTextToText with custom-kernels.

NeMoAutoModelForSequenceClassification

Drop-in replacement for transformers.AutoModelForSequenceClassification with custom-kernels.

Functions#

_assert_same_signature

Raise AssertionError if the two call signatures differ.

_patch_attention

Wrap the forward method of obj in an sdap_kernel context manager.

_patch_liger_kernel

Patches a model with liger-kernel and sdpa_kernel

_get_next_fallback_attn

Get the next attention implementation in the priority list, in reverse order.

Data#

API#

nemo_automodel.components._transformers.auto_model.logger#

‘getLogger(…)’

nemo_automodel.components._transformers.auto_model._assert_same_signature(original, patched)[source]#

Raise AssertionError if the two call signatures differ.

nemo_automodel.components._transformers.auto_model._patch_attention(obj, sdpa_method=None)[source]#

Wrap the forward method of obj in an sdap_kernel context manager.

Parameters:
  • obj – Any object with a .forward(*args, **kwargs) method.

  • sdpa_method (list[SDPBackend], optional) – Ordered list of SDPBackend implementations to attempt. If None, defaults to [CUDNN_ATTENTION, FLASH_ATTENTION, EFFICIENT_ATTENTION, MATH].

Returns:

The same obj with its .forward method patched.

nemo_automodel.components._transformers.auto_model._patch_liger_kernel(model)[source]#

Patches a model with liger-kernel and sdpa_kernel

Parameters:
  • model (nn.Module) – the model to patch

  • use_liger_kernel (bool) – Applies liger-kernel to model Default True.

  • use_sdpa_patching (bool) – Enables model patching with SDPA kernel optim. Default True.

  • sdpa_method (list[SDPBackend], optional) – Ordered list of SDPBackend implementations to attempt. If None, defaults to [CUDNN_ATTENTION, FLASH_ATTENTION, EFFICIENT_ATTENTION, MATH].

Returns:

the patched model

Return type:

nn.Module

nemo_automodel.components._transformers.auto_model._get_next_fallback_attn(attn_implementation: str) str[source]#

Get the next attention implementation in the priority list, in reverse order.

If a model does not support a given attention implementation, the next implementation in the priority list is returned.

If the current attention implementation is not in the priority list, it uses eager.

Parameters:

attn_implementation (str) – The current attention implementation.

Returns:

The next attention implementation in the priority list.

Return type:

str

class nemo_automodel.components._transformers.auto_model._BaseNeMoAutoModelClass(*args, **kwargs)[source]#

Bases: transformers.models.auto.auto_factory._BaseAutoModelClass

Drop-in replacement for _BaseAutoModelClass that includes custom-kernels.

The class only overrides from_pretrained and from_config to add the optional use_liger_kernel flag. If the flag is True (default) and the Liger kernel is available, the model’s attention layers are monkey-patched in place. If patching fails for any reason, the call is retried once with use_liger_kernel=False so that users still obtain a functional model.

TODO(@akoumpa): extend this beyond liger_kernel.

Notes:#

  • No changes are made to the model’s public API; forward signatures, generation utilities, and weight shapes remain identical.

  • Only decoder-style (causal) architectures are currently supported by the Liger patch. Unsupported models will silently fall back.

Initialization

classmethod from_pretrained(
pretrained_model_name_or_path,
*model_args,
use_liger_kernel: bool = True,
use_sdpa_patching: bool = True,
sdpa_method: Optional[List[torch.nn.attention.SDPBackend]] = None,
torch_dtype='auto',
attn_implementation: str = 'flash_attention_2',
**kwargs,
) transformers.PreTrainedModel[source]#

Instantiate and (optionally) patch a causal-language model.

This is a light wrapper around transformers.AutoModelForCausalLM.from_pretrained that can automatically apply Liger and/or SDPA (scaled-dot-product attention) kernel optimizations.

Parameters:
  • pretrained_model_name_or_path (str | os.PathLike) – Hugging Face hub repo ID or local path accepted by AutoModelForCausalLM.from_pretrained.

  • *model_args – Positional arguments forwarded verbatim to AutoModelForCausalLM.from_pretrained.

  • use_liger_kernel (bool, default=True) – If True, try to patch the model with Liger kernels for faster inference/training.

  • use_sdpa_patching (bool, default=True) – If True, patch the model with SDPA-based attention optimizations.

  • sdpa_method (list[SDPBackend] | None, optional) – Explicit list of SDPA back-ends to consider when use_sdpa_patching=True.

  • torch_dtype (str | torch.dtype | Literal["auto"], default="auto") – Data type passed to the underlying from_pretrained call.

  • attn_implementation (str, default="flash_attention_2") – Desired attention implementation; forwarded to the HF config.

  • fp8_config (FP8Config, optional) – FP8 configuration object that specifies all FP8 quantization settings. If provided, FP8 quantization will be applied to the model for improved performance on supported hardware.

  • **kwargs – Additional keyword arguments forwarded verbatim to AutoModelForCausalLM.from_pretrained.

Returns:

The loaded (and possibly patched) model instance.

Return type:

transformers.PreTrainedModel

Warns:
  • UserWarning – Emitted when use_liger_kernel=True but the Liger

  • package is unavailable.

.. rubric:: Notes

If kernel patching fails, the partially constructed model is deleted and the method recurses once with use_liger_kernel=False or use_sdpa_patching=False

classmethod from_config(
config,
*model_args,
use_liger_kernel: bool = True,
use_sdpa_patching: bool = True,
sdpa_method: Optional[List[torch.nn.attention.SDPBackend]] = None,
torch_dtype: Union[str, torch.dtype] = 'auto',
attn_implementation: str = 'flash_attention_2',
**kwargs,
) transformers.PreTrainedModel[source]#

Instantiate a model from a transformers.PretrainedConfig and optionally patch it with Liger or SDPA-optimized kernels.

Parameters:
  • config (transformers.PretrainedConfig) – The configuration object used to build the model.

  • *model_args – Positional arguments forwarded to the underlying transformers.AutoModelForCausalLM.from_config call.

  • use_liger_kernel (bool, optional) – If True, tries to patch the instantiated model with Liger optimized attention kernels. Defaults to True.

  • use_sdpa_patching (bool, optional) – If True, applies in-place SDPA (Scaled-Dot-Product-Attention) kernel optimizations wherever possible. Defaults to True.

  • sdpa_method (Optional[List[SDPBackend]], optional) – One or multiple SDPA back-ends to prefer when applying SDPA patching. When None, the default backend resolution logic is used. Defaults to None.

  • attn_implementation (str, optional) – Specifies which attention implementation to use (e.g., "flash_attention_2", "eager"). Only applied when the base model supports this kwarg. Defaults to "flash_attention_2".

  • **kwargs – Additional keyword arguments forwarded to the superclass constructor and underlying from_config logic.

Returns:

The instantiated (and possibly kernel-patched) model.

Return type:

transformers.PreTrainedModel

.. rubric:: Notes

If kernel patching fails, the partially constructed model is deleted and the method recurses once with use_liger_kernel=False or use_sdpa_patching=False

class nemo_automodel.components._transformers.auto_model.NeMoAutoModelForCausalLM(*args, **kwargs)[source]#

Bases: nemo_automodel.components._transformers.auto_model._BaseNeMoAutoModelClass, transformers.AutoModelForCausalLM

Drop-in replacement for transformers.AutoModelForCausalLM that includes custom-kernels.

The class only overrides from_pretrained and from_config to add the optional use_liger_kernel flag. If the flag is True (default) and the Liger kernel is available, the model’s attention layers are monkey-patched in place. If patching fails for any reason, the call is retried once with use_liger_kernel=False so that users still obtain a functional model.

TODO(@akoumpa): extend this beyond liger_kernel.

Notes:#

  • No changes are made to the model’s public API; forward signatures, generation utilities, and weight shapes remain identical.

  • Only decoder-style (causal) architectures are currently supported by the Liger patch. Unsupported models will silently fall back.

Examples:#

model = NeMoAutoModelForCausalLM.from_pretrained(“gpt2”) # try Liger model = NeMoAutoModelForCausalLM.from_pretrained( … “gpt2”, use_liger_kernel=False) # skip Liger

Initialization

class nemo_automodel.components._transformers.auto_model.NeMoAutoModelForImageTextToText(*args, **kwargs)[source]#

Bases: nemo_automodel.components._transformers.auto_model._BaseNeMoAutoModelClass, transformers.AutoModelForImageTextToText

Drop-in replacement for transformers.AutoModelForImageTextToText with custom-kernels.

The class only overrides from_pretrained and from_config to add the optional use_liger_kernel flag. If the flag is True (default) and the Liger kernel is available, the model’s attention layers are monkey-patched in place. If patching fails for any reason, the call is retried once with use_liger_kernel=False so that users still obtain a functional model.

@akoumpa: currently only supporting liger_kernel for demonstration purposes.

Notes:#

  • No changes are made to the model’s public API; forward signatures, generation utilities, and weight shapes remain identical.

  • Only decoder-style (causal) architectures are currently supported by the Liger patch. Unsupported models will silently fall back.

Examples:#

model = NeMoAutoModelForImageTextToText.from_pretrained(“Qwen/Qwen2.5-VL-3B-Instruct”) # try Liger model = NeMoAutoModelForImageTextToText.from_pretrained( … “Qwen/Qwen2.5-VL-3B-Instruct”, use_liger_kernel=False) # skip Liger

Initialization

class nemo_automodel.components._transformers.auto_model.NeMoAutoModelForSequenceClassification(*args, **kwargs)[source]#

Bases: nemo_automodel.components._transformers.auto_model._BaseNeMoAutoModelClass, transformers.AutoModelForSequenceClassification

Drop-in replacement for transformers.AutoModelForSequenceClassification with custom-kernels.

The class only overrides from_pretrained and from_config to add the optional use_liger_kernel flag. If the flag is True (default) and the Liger kernel is available, the model’s attention layers are monkey-patched in place. If patching fails for any reason, the call is retried once with use_liger_kernel=False so that users still obtain a functional model.

@akoumpa: currently only supporting liger_kernel for demonstration purposes.

Notes:#

  • No changes are made to the model’s public API; forward signatures, generation utilities, and weight shapes remain identical.

  • Only decoder-style (causal) architectures are currently supported by the Liger patch. Unsupported models will silently fall back.

Examples:#

model = NeMoAutoModelForSequenceClassification.from_pretrained(“bert-base-uncased”) # try Liger model = NeMoAutoModelForSequenceClassification.from_pretrained( … “bert-base-uncased”, use_liger_kernel=False) # skip Liger

Initialization