nemo_automodel.components._transformers.auto_model
#
Module Contents#
Classes#
Drop-in replacement for |
|
Drop-in replacement for |
|
Drop-in replacement for |
|
Drop-in replacement for |
Functions#
Raise AssertionError if the two call signatures differ. |
|
Wrap the |
|
Patches a model with liger-kernel and sdpa_kernel |
|
Get the next attention implementation in the priority list, in reverse order. |
Data#
API#
- nemo_automodel.components._transformers.auto_model.logger#
‘getLogger(…)’
- nemo_automodel.components._transformers.auto_model._assert_same_signature(original, patched)[source]#
Raise AssertionError if the two call signatures differ.
- nemo_automodel.components._transformers.auto_model._patch_attention(obj, sdpa_method=None)[source]#
Wrap the
forward
method ofobj
in ansdap_kernel
context manager.- Parameters:
obj – Any object with a
.forward(*args, **kwargs)
method.sdpa_method (list[SDPBackend], optional) – Ordered list of SDPBackend implementations to attempt. If None, defaults to [CUDNN_ATTENTION, FLASH_ATTENTION, EFFICIENT_ATTENTION, MATH].
- Returns:
The same
obj
with its.forward
method patched.
- nemo_automodel.components._transformers.auto_model._patch_liger_kernel(model)[source]#
Patches a model with liger-kernel and sdpa_kernel
- Parameters:
model (nn.Module) – the model to patch
use_liger_kernel (bool) – Applies liger-kernel to model Default True.
use_sdpa_patching (bool) – Enables model patching with SDPA kernel optim. Default True.
sdpa_method (list[SDPBackend], optional) – Ordered list of SDPBackend implementations to attempt. If None, defaults to [CUDNN_ATTENTION, FLASH_ATTENTION, EFFICIENT_ATTENTION, MATH].
- Returns:
the patched model
- Return type:
nn.Module
- nemo_automodel.components._transformers.auto_model._get_next_fallback_attn(attn_implementation: str) str [source]#
Get the next attention implementation in the priority list, in reverse order.
If a model does not support a given attention implementation, the next implementation in the priority list is returned.
If the current attention implementation is not in the priority list, it uses eager.
- Parameters:
attn_implementation (str) – The current attention implementation.
- Returns:
The next attention implementation in the priority list.
- Return type:
str
- class nemo_automodel.components._transformers.auto_model._BaseNeMoAutoModelClass(*args, **kwargs)[source]#
Bases:
transformers.models.auto.auto_factory._BaseAutoModelClass
Drop-in replacement for
_BaseAutoModelClass
that includes custom-kernels.The class only overrides
from_pretrained
andfrom_config
to add the optionaluse_liger_kernel
flag. If the flag isTrue
(default) and the Liger kernel is available, the model’s attention layers are monkey-patched in place. If patching fails for any reason, the call is retried once withuse_liger_kernel=False
so that users still obtain a functional model.TODO(@akoumpa): extend this beyond liger_kernel.
Notes:#
No changes are made to the model’s public API; forward signatures, generation utilities, and weight shapes remain identical.
Only decoder-style (causal) architectures are currently supported by the Liger patch. Unsupported models will silently fall back.
Initialization
- classmethod from_pretrained(
- pretrained_model_name_or_path,
- *model_args,
- use_liger_kernel: bool = True,
- use_sdpa_patching: bool = True,
- sdpa_method: Optional[List[torch.nn.attention.SDPBackend]] = None,
- torch_dtype='auto',
- attn_implementation: str = 'flash_attention_2',
- **kwargs,
Instantiate and (optionally) patch a causal-language model.
This is a light wrapper around
transformers.AutoModelForCausalLM.from_pretrained
that can automatically apply Liger and/or SDPA (scaled-dot-product attention) kernel optimizations.- Parameters:
pretrained_model_name_or_path (str | os.PathLike) – Hugging Face hub repo ID or local path accepted by
AutoModelForCausalLM.from_pretrained
.*model_args – Positional arguments forwarded verbatim to
AutoModelForCausalLM.from_pretrained
.use_liger_kernel (bool, default=True) – If
True
, try to patch the model with Liger kernels for faster inference/training.use_sdpa_patching (bool, default=True) – If
True
, patch the model with SDPA-based attention optimizations.sdpa_method (list[SDPBackend] | None, optional) – Explicit list of SDPA back-ends to consider when
use_sdpa_patching=True
.torch_dtype (str | torch.dtype | Literal["auto"], default="auto") – Data type passed to the underlying
from_pretrained
call.attn_implementation (str, default="flash_attention_2") – Desired attention implementation; forwarded to the HF config.
fp8_config (FP8Config, optional) – FP8 configuration object that specifies all FP8 quantization settings. If provided, FP8 quantization will be applied to the model for improved performance on supported hardware.
**kwargs – Additional keyword arguments forwarded verbatim to
AutoModelForCausalLM.from_pretrained
.
- Returns:
The loaded (and possibly patched) model instance.
- Return type:
transformers.PreTrainedModel
- Warns:
UserWarning – Emitted when
use_liger_kernel=True
but the Ligerpackage is unavailable.
.. rubric:: Notes
If kernel patching fails, the partially constructed model is deleted and the method recurses once with
use_liger_kernel=False
oruse_sdpa_patching=False
- classmethod from_config(
- config,
- *model_args,
- use_liger_kernel: bool = True,
- use_sdpa_patching: bool = True,
- sdpa_method: Optional[List[torch.nn.attention.SDPBackend]] = None,
- torch_dtype: Union[str, torch.dtype] = 'auto',
- attn_implementation: str = 'flash_attention_2',
- **kwargs,
Instantiate a model from a
transformers.PretrainedConfig
and optionally patch it with Liger or SDPA-optimized kernels.- Parameters:
config (transformers.PretrainedConfig) – The configuration object used to build the model.
*model_args – Positional arguments forwarded to the underlying
transformers.AutoModelForCausalLM.from_config
call.use_liger_kernel (bool, optional) – If
True
, tries to patch the instantiated model with Liger optimized attention kernels. Defaults toTrue
.use_sdpa_patching (bool, optional) – If
True
, applies in-place SDPA (Scaled-Dot-Product-Attention) kernel optimizations wherever possible. Defaults toTrue
.sdpa_method (Optional[List[SDPBackend]], optional) – One or multiple SDPA back-ends to prefer when applying SDPA patching. When
None
, the default backend resolution logic is used. Defaults toNone
.attn_implementation (str, optional) – Specifies which attention implementation to use (e.g.,
"flash_attention_2"
,"eager"
). Only applied when the base model supports this kwarg. Defaults to"flash_attention_2"
.**kwargs – Additional keyword arguments forwarded to the superclass constructor and underlying
from_config
logic.
- Returns:
The instantiated (and possibly kernel-patched) model.
- Return type:
transformers.PreTrainedModel
.. rubric:: Notes
If kernel patching fails, the partially constructed model is deleted and the method recurses once with
use_liger_kernel=False
oruse_sdpa_patching=False
- class nemo_automodel.components._transformers.auto_model.NeMoAutoModelForCausalLM(*args, **kwargs)[source]#
Bases:
nemo_automodel.components._transformers.auto_model._BaseNeMoAutoModelClass
,transformers.AutoModelForCausalLM
Drop-in replacement for
transformers.AutoModelForCausalLM
that includes custom-kernels.The class only overrides
from_pretrained
andfrom_config
to add the optionaluse_liger_kernel
flag. If the flag isTrue
(default) and the Liger kernel is available, the model’s attention layers are monkey-patched in place. If patching fails for any reason, the call is retried once withuse_liger_kernel=False
so that users still obtain a functional model.TODO(@akoumpa): extend this beyond liger_kernel.
Notes:#
No changes are made to the model’s public API; forward signatures, generation utilities, and weight shapes remain identical.
Only decoder-style (causal) architectures are currently supported by the Liger patch. Unsupported models will silently fall back.
Examples:#
model = NeMoAutoModelForCausalLM.from_pretrained(“gpt2”) # try Liger model = NeMoAutoModelForCausalLM.from_pretrained( … “gpt2”, use_liger_kernel=False) # skip Liger
Initialization
- class nemo_automodel.components._transformers.auto_model.NeMoAutoModelForImageTextToText(*args, **kwargs)[source]#
Bases:
nemo_automodel.components._transformers.auto_model._BaseNeMoAutoModelClass
,transformers.AutoModelForImageTextToText
Drop-in replacement for
transformers.AutoModelForImageTextToText
with custom-kernels.The class only overrides
from_pretrained
andfrom_config
to add the optionaluse_liger_kernel
flag. If the flag isTrue
(default) and the Liger kernel is available, the model’s attention layers are monkey-patched in place. If patching fails for any reason, the call is retried once withuse_liger_kernel=False
so that users still obtain a functional model.@akoumpa: currently only supporting liger_kernel for demonstration purposes.
Notes:#
No changes are made to the model’s public API; forward signatures, generation utilities, and weight shapes remain identical.
Only decoder-style (causal) architectures are currently supported by the Liger patch. Unsupported models will silently fall back.
Examples:#
model = NeMoAutoModelForImageTextToText.from_pretrained(“Qwen/Qwen2.5-VL-3B-Instruct”) # try Liger model = NeMoAutoModelForImageTextToText.from_pretrained( … “Qwen/Qwen2.5-VL-3B-Instruct”, use_liger_kernel=False) # skip Liger
Initialization
- class nemo_automodel.components._transformers.auto_model.NeMoAutoModelForSequenceClassification(*args, **kwargs)[source]#
Bases:
nemo_automodel.components._transformers.auto_model._BaseNeMoAutoModelClass
,transformers.AutoModelForSequenceClassification
Drop-in replacement for
transformers.AutoModelForSequenceClassification
with custom-kernels.The class only overrides
from_pretrained
andfrom_config
to add the optionaluse_liger_kernel
flag. If the flag isTrue
(default) and the Liger kernel is available, the model’s attention layers are monkey-patched in place. If patching fails for any reason, the call is retried once withuse_liger_kernel=False
so that users still obtain a functional model.@akoumpa: currently only supporting liger_kernel for demonstration purposes.
Notes:#
No changes are made to the model’s public API; forward signatures, generation utilities, and weight shapes remain identical.
Only decoder-style (causal) architectures are currently supported by the Liger patch. Unsupported models will silently fall back.
Examples:#
model = NeMoAutoModelForSequenceClassification.from_pretrained(“bert-base-uncased”) # try Liger model = NeMoAutoModelForSequenceClassification.from_pretrained( … “bert-base-uncased”, use_liger_kernel=False) # skip Liger
Initialization