nemo_automodel._transformers.auto_model#
Module Contents#
Classes#
Drop-in replacement for |
|
Drop-in replacement for |
|
Drop-in replacement for |
|
Drop-in replacement for |
|
Drop-in replacement for |
Functions#
Raise AssertionError if the two call signatures differ. |
|
Wrap the |
|
Patches a model with liger-kernel and sdpa_kernel |
|
Get the next attention implementation in the priority list, in reverse order. |
|
Resolve trust_remote_code default, fetch HF config and determine if model is HF-based. |
|
Extract and remove TP/CP/packed flags from kwargs. |
|
Compute final attention implementation and liger-kernel flag based on TP/CP and packed sequence constraints. |
|
Validate SDPA support when CP is enabled for HF models. |
Data#
API#
- nemo_automodel._transformers.auto_model.logger#
‘getLogger(…)’
- nemo_automodel._transformers.auto_model._assert_same_signature(original, patched)#
Raise AssertionError if the two call signatures differ.
- nemo_automodel._transformers.auto_model._patch_attention(obj, sdpa_method=None)#
Wrap the
forwardmethod ofobjin ansdap_kernelcontext manager.- Parameters:
obj – Any object with a
.forward(*args, **kwargs)method.sdpa_method (list[SDPBackend], optional) – Ordered list of SDPBackend implementations to attempt. If None, defaults to [CUDNN_ATTENTION, FLASH_ATTENTION, EFFICIENT_ATTENTION, MATH].
- Returns:
The same
objwith its.forwardmethod patched.
- nemo_automodel._transformers.auto_model._patch_liger_kernel(model)#
Patches a model with liger-kernel and sdpa_kernel
- Parameters:
model (nn.Module) – the model to patch
use_liger_kernel (bool) – Applies liger-kernel to model Default True.
use_sdpa_patching (bool) – Enables model patching with SDPA kernel optim. Default True.
sdpa_method (list[SDPBackend], optional) – Ordered list of SDPBackend implementations to attempt. If None, defaults to [CUDNN_ATTENTION, FLASH_ATTENTION, EFFICIENT_ATTENTION, MATH].
- Returns:
the patched model
- Return type:
nn.Module
- nemo_automodel._transformers.auto_model._get_next_fallback_attn(attn_implementation: str) str#
Get the next attention implementation in the priority list, in reverse order.
If a model does not support a given attention implementation, the next implementation in the priority list is returned.
If the current attention implementation is not in the priority list, it uses eager.
- Parameters:
attn_implementation (str) – The current attention implementation.
- Returns:
The next attention implementation in the priority list.
- Return type:
str
- nemo_automodel._transformers.auto_model._prepare_hf_config_and_flag(
- pretrained_model_name_or_path,
- force_hf,
- kwargs,
Resolve trust_remote_code default, fetch HF config and determine if model is HF-based.
- nemo_automodel._transformers.auto_model._pop_tp_cp_has_packed(kwargs)#
Extract and remove TP/CP/packed flags from kwargs.
- nemo_automodel._transformers.auto_model._apply_preload_overrides(
- is_hf_model,
- tp_size,
- cp_size,
- has_packed_sequence,
- attn_implementation,
- use_liger_kernel,
Compute final attention implementation and liger-kernel flag based on TP/CP and packed sequence constraints.
- nemo_automodel._transformers.auto_model._verify_sdpa_support(model, is_hf_model, cp_size)#
Validate SDPA support when CP is enabled for HF models.
- class nemo_automodel._transformers.auto_model._BaseNeMoAutoModelClass#
Bases:
transformers.models.auto.auto_factory._BaseAutoModelClassDrop-in replacement for
_BaseAutoModelClassthat includes custom-kernels.The class only overrides
from_pretrainedandfrom_configto add the optionaluse_liger_kernelflag. If the flag isTrue(default) and the Liger kernel is available, the model’s attention layers are monkey-patched in place. If patching fails for any reason, the call is retried once withuse_liger_kernel=Falseso that users still obtain a functional model.TODO(@akoumpa): extend this beyond liger_kernel.
Notes:#
No changes are made to the model’s public API; forward signatures, generation utilities, and weight shapes remain identical.
Only decoder-style (causal) architectures are currently supported by the Liger patch. Unsupported models will silently fall back.
- classmethod from_pretrained(
- pretrained_model_name_or_path,
- *model_args,
- use_liger_kernel: bool = True,
- use_sdpa_patching: bool = True,
- sdpa_method: Optional[List[torch.nn.attention.SDPBackend]] = None,
- torch_dtype='auto',
- attn_implementation: str = 'flash_attention_2',
- quantization_config=None,
- force_hf: bool = False,
- **kwargs,
Instantiate and (optionally) patch a causal-language model.
This is a light wrapper around
transformers.AutoModelForCausalLM.from_pretrainedthat can automatically apply Liger and/or SDPA (scaled-dot-product attention) kernel optimizations.- Parameters:
pretrained_model_name_or_path (str | os.PathLike) – Hugging Face hub repo ID or local path accepted by
AutoModelForCausalLM.from_pretrained.*model_args – Positional arguments forwarded verbatim to
AutoModelForCausalLM.from_pretrained.use_liger_kernel (bool, default=True) – If
True, try to patch the model with Liger kernels for faster inference/training.use_sdpa_patching (bool, default=True) – If
True, patch the model with SDPA-based attention optimizations.sdpa_method (list[SDPBackend] | None, optional) – Explicit list of SDPA back-ends to consider when
use_sdpa_patching=True.torch_dtype (str | torch.dtype | Literal["auto"], default="auto") – Data type passed to the underlying
from_pretrainedcall.attn_implementation (str, default="flash_attention_2") – Desired attention implementation; forwarded to the HF config.
quantization_config (optional) – BitsAndBytesConfig configuration object that specifies all quantization settings. If provided, quantization will be applied to the model.
force_hf (bool, default=False) – If
True, force the use of HF model implementation. IfFalse, the model will be loaded using the custom model implementation if available.**kwargs – Additional keyword arguments forwarded verbatim to
AutoModelForCausalLM.from_pretrained.
- Returns:
The loaded (and possibly patched) model instance.
- Return type:
transformers.PreTrainedModel
- Warns:
UserWarning – Emitted when
use_liger_kernel=Truebut the Ligerpackage is unavailable.
.. rubric:: Notes
If kernel patching fails, the partially constructed model is deleted and the method recurses once with
use_liger_kernel=Falseoruse_sdpa_patching=False
- classmethod from_config(
- config,
- *model_args,
- use_liger_kernel: bool = True,
- use_sdpa_patching: bool = True,
- sdpa_method: Optional[List[torch.nn.attention.SDPBackend]] = None,
- torch_dtype: Union[str, torch.dtype] = 'auto',
- attn_implementation: str = 'flash_attention_2',
- quantization_config=None,
- force_hf: bool = False,
- **kwargs,
Instantiate a model from a
transformers.PretrainedConfigand optionally patch it with Liger or SDPA-optimized kernels.- Parameters:
config (transformers.PretrainedConfig) – The configuration object used to build the model.
*model_args – Positional arguments forwarded to the underlying
transformers.AutoModelForCausalLM.from_configcall.use_liger_kernel (bool, optional) – If
True, tries to patch the instantiated model with Liger optimized attention kernels. Defaults toTrue.use_sdpa_patching (bool, optional) – If
True, applies in-place SDPA (Scaled-Dot-Product-Attention) kernel optimizations wherever possible. Defaults toTrue.sdpa_method (Optional[List[SDPBackend]], optional) – One or multiple SDPA back-ends to prefer when applying SDPA patching. When
None, the default backend resolution logic is used. Defaults toNone.attn_implementation (str, optional) – Specifies which attention implementation to use (e.g.,
"flash_attention_2","eager"). Only applied when the base model supports this kwarg. Defaults to"flash_attention_2".force_hf (bool, default=False) – If
True, force the use of HF model implementation. IfFalse, the model will be loaded using the custom model implementation if available.**kwargs – Additional keyword arguments forwarded to the superclass constructor and underlying
from_configlogic.
- Returns:
The instantiated (and possibly kernel-patched) model.
- Return type:
transformers.PreTrainedModel
.. rubric:: Notes
If kernel patching fails, the partially constructed model is deleted and the method recurses once with
use_liger_kernel=Falseoruse_sdpa_patching=False
- class nemo_automodel._transformers.auto_model.NeMoAutoModelForCausalLM#
Bases:
nemo_automodel._transformers.auto_model._BaseNeMoAutoModelClass,transformers.AutoModelForCausalLMDrop-in replacement for
transformers.AutoModelForCausalLMthat includes custom-kernels.The class only overrides
from_pretrainedandfrom_configto add the optionaluse_liger_kernelflag. If the flag isTrue(default) and the Liger kernel is available, the model’s attention layers are monkey-patched in place. If patching fails for any reason, the call is retried once withuse_liger_kernel=Falseso that users still obtain a functional model.TODO(@akoumpa): extend this beyond liger_kernel.
Notes:#
No changes are made to the model’s public API; forward signatures, generation utilities, and weight shapes remain identical.
Only decoder-style (causal) architectures are currently supported by the Liger patch. Unsupported models will silently fall back.
Examples:#
model = NeMoAutoModelForCausalLM.from_pretrained(“gpt2”) # try Liger model = NeMoAutoModelForCausalLM.from_pretrained( … “gpt2”, use_liger_kernel=False) # skip Liger
- class nemo_automodel._transformers.auto_model.NeMoAutoModelForImageTextToText#
Bases:
nemo_automodel._transformers.auto_model._BaseNeMoAutoModelClass,transformers.AutoModelForImageTextToTextDrop-in replacement for
transformers.AutoModelForImageTextToTextwith custom-kernels.The class only overrides
from_pretrainedandfrom_configto add the optionaluse_liger_kernelflag. If the flag isTrue(default) and the Liger kernel is available, the model’s attention layers are monkey-patched in place. If patching fails for any reason, the call is retried once withuse_liger_kernel=Falseso that users still obtain a functional model.@akoumpa: currently only supporting liger_kernel for demonstration purposes.
Notes:#
No changes are made to the model’s public API; forward signatures, generation utilities, and weight shapes remain identical.
Only decoder-style (causal) architectures are currently supported by the Liger patch. Unsupported models will silently fall back.
Examples:#
model = NeMoAutoModelForImageTextToText.from_pretrained(“Qwen/Qwen2.5-VL-3B-Instruct”) # try Liger model = NeMoAutoModelForImageTextToText.from_pretrained( … “Qwen/Qwen2.5-VL-3B-Instruct”, use_liger_kernel=False) # skip Liger
- class nemo_automodel._transformers.auto_model.NeMoAutoModelForSequenceClassification#
Bases:
nemo_automodel._transformers.auto_model._BaseNeMoAutoModelClass,transformers.AutoModelForSequenceClassificationDrop-in replacement for
transformers.AutoModelForSequenceClassificationwith custom-kernels.The class only overrides
from_pretrainedandfrom_configto add the optionaluse_liger_kernelflag. If the flag isTrue(default) and the Liger kernel is available, the model’s attention layers are monkey-patched in place. If patching fails for any reason, the call is retried once withuse_liger_kernel=Falseso that users still obtain a functional model.@akoumpa: currently only supporting liger_kernel for demonstration purposes.
Notes:#
No changes are made to the model’s public API; forward signatures, generation utilities, and weight shapes remain identical.
Only decoder-style (causal) architectures are currently supported by the Liger patch. Unsupported models will silently fall back.
Examples:#
model = NeMoAutoModelForSequenceClassification.from_pretrained(“bert-base-uncased”) # try Liger model = NeMoAutoModelForSequenceClassification.from_pretrained( … “bert-base-uncased”, use_liger_kernel=False) # skip Liger
- class nemo_automodel._transformers.auto_model.NeMoAutoModelForTextToWaveform#
Bases:
nemo_automodel._transformers.auto_model._BaseNeMoAutoModelClass,transformers.AutoModelForTextToWaveformDrop-in replacement for
transformers.AutoModelForTextToWaveformwith custom-kernels.The class only overrides
from_pretrainedandfrom_configto add the optionaluse_liger_kernelflag. If the flag isTrue(default) and the Liger kernel is available, the model’s attention layers are monkey-patched in place. If patching fails for any reason, the call is retried once withuse_liger_kernel=Falseso that users still obtain a functional model.@akoumpa: currently only supporting liger_kernel for demonstration purposes.
Notes:#
No changes are made to the model’s public API; forward signatures, generation utilities, and weight shapes remain identical.
Only decoder-style (causal) architectures are currently supported by the Liger patch. Unsupported models will silently fall back.
Examples:#
model = NeMoAutoModelForTextToWaveform.from_pretrained(“facebook/musicgen-small”) # try Liger model = NeMoAutoModelForTextToWaveform.from_pretrained( … “facebook/musicgen-small”, use_liger_kernel=False) # skip Liger