`nemo_automodel._transformers.capabilities`#

Model capabilities introspection and input validation.

Provides :class:ModelSupports (a read-only descriptor of what a model can do) and :func:attach_capabilities_and_validate which attaches model.supports, model.supports_*, and model.validate_for_mesh to any nn.Module.

Capabilities are derived from code introspection – class attributes, mixin inheritance, forward-signature inspection – so they stay in sync as models evolve without manual feature tables.

Module Contents#

Classes#

ModelSupports

Queryable feature-support descriptor attached to a model instance.

Functions#

`_has_optimized_tp_plan`	Check if model_cls has an entry in `PARALLELIZE_FUNCTIONS`.
`_is_moe`
`_supports_seq_lens`	True when `model.forward()` accepts a `seq_lens` kwarg.
`_has_backend`	True for custom models that carry a `BackendConfig`.
`_uses_te_attention`	True when the model was constructed with the TE attention backend.
`_is_hybrid`	True when the model mixes attention with non-attention layers (e.g. Mamba/SSM).
`validate_for_mesh`	Validate mesh parallelism sizes against this model’s capabilities.
`_supports_forwarding_property`	Property that forwards `model.<name>` to `model.supports.<name>`.
`_lazy_supports_property`
`_build_class_dict`
`attach_capabilities_and_validate`	Attach `model.supports` and `model.supports_*` and call validate_for_mesh.

Data#

logger

API#

nemo_automodel._transformers.capabilities.logger#: ‘getLogger(…)’

nemo_automodel._transformers.capabilities._has_optimized_tp_plan(model_cls: type) → bool[source]#: Check if model_cls has an entry in PARALLELIZE_FUNCTIONS.

nemo_automodel._transformers.capabilities._is_moe(model_cls: type) → bool[source]#

nemo_automodel._transformers.capabilities._supports_seq_lens(model: torch.nn.Module) → bool[source]#: True when model.forward() accepts a seq_lens kwarg.

nemo_automodel._transformers.capabilities._has_backend(model: torch.nn.Module) → bool[source]#: True for custom models that carry a BackendConfig.

nemo_automodel._transformers.capabilities._uses_te_attention(model: torch.nn.Module) → bool[source]#: True when the model was constructed with the TE attention backend.

nemo_automodel._transformers.capabilities._is_hybrid(model: torch.nn.Module) → bool[source]#

True when the model mixes attention with non-attention layers (e.g. Mamba/SSM).

Detected via config attributes used by NemotronH (layers_block_type) and HF hybrid models (hybrid_override_pattern, is_hybrid_model).

class nemo_automodel._transformers.capabilities.ModelSupports(model: torch.nn.Module, mesh: MeshContext | None = None)[source]#

Queryable feature-support descriptor attached to a model instance.

Every property is derived from introspection of the live model so it reflects the actual class hierarchy and forward signature, not a hand-maintained table.

Usage::

model = NeMoAutoModelForCausalLM.from_pretrained(...)
model.supports.tp   # True / False
model.supports.pp   # ...

Initialization

__slots__#: (‘_model’, ‘_model_cls’, ‘_mesh’)

__repr__() → str[source]#

property is_custom_model: bool#: True when the model class has a custom (non-HF) implementation in the registry.

property supports_tp: bool#: Model has an optimized or HF-native tensor-parallel plan.

property supports_pp: bool#

Model supports pipeline parallelism.

True when the model either declares a _pp_plan or inherits from MoEFSDPSyncMixin (MoE models handle PP via patched_backward_maybe_with_nosync).

property supports_tp_plan: bool#

property supports_pp_plan: bool#

property supports_cp: bool#

Model supports context parallelism.

+——————+—————-+———+ | Model kind | Attention | CP? | +——————+—————-+———+ | Custom | TE | Yes | | Custom hybrid | TE / SDPA | Yes | | Custom | FlexAttention | No | | HF (pure attn) | SDPA | Yes | | HF (pure attn) | no SDPA | No | | HF hybrid (Mamba)| any | No | +——————+—————-+———+

property supports_ep: bool#: Model is a Mixture-of-Experts that supports expert parallelism.

property supports_sequence_packing: bool#: forward() accepts seq_lens for packed-sequence training.

property supports_generate: bool#: Model has a generate() method for autoregressive inference.

property supports_gradient_checkpointing: bool#: Gradient checkpointing is supported.

property cp_size: int#

property tp_size: int#

property pp_size: int#

property ep_size: int#

property supports_cp_with_sequence_packing: bool#: CP + packed sequences requires TE attention backend.

nemo_automodel._transformers.capabilities.validate_for_mesh( model: torch.nn.Module, mesh: nemo_automodel.components.distributed.mesh.MeshContext, ) → None[source]#

Validate mesh parallelism sizes against this model’s capabilities.

Works both as a bound method (model.validate_for_mesh()) and as a standalone call (validate_for_mesh(model)).

Raises :class:ValueError with one bullet per violation.

nemo_automodel._transformers.capabilities._supports_forwarding_property(name: str) → property[source]#: Property that forwards model.<name> to model.supports.<name>.

nemo_automodel._transformers.capabilities._lazy_supports_property( self: torch.nn.Module, ) → nemo_automodel._transformers.capabilities.ModelSupports[source]#

nemo_automodel._transformers.capabilities._build_class_dict() → dict[str, property | type][source]#

nemo_automodel._transformers.capabilities.attach_capabilities_and_validate( model: torch.nn.Module, mesh: nemo_automodel.components.distributed.mesh.MeshContext, ) → torch.nn.Module[source]#

Attach model.supports and model.supports_* and call validate_for_mesh.

Injects a thin dynamic subclass so that property descriptors (supports_*) resolve via __getattribute__ with no __getattr__ overhead, which avoids triggering ModelCapabilitiesMixin.getattr for models that lack the attribute. Safe to call more than once – subsequent calls are no-ops.

nemo_automodel._transformers.capabilities#