nemo_automodel._transformers.capabilities#

Model capabilities introspection and input validation.

Provides :class:ModelSupports (a read-only descriptor of what a model can do) and :func:attach_capabilities_and_validate which attaches model.supports, model.supports_*, and model.validate_for_mesh to any nn.Module.

Capabilities are derived from code introspection – class attributes, mixin inheritance, forward-signature inspection – so they stay in sync as models evolve without manual feature tables.

Module Contents#

Classes#

ModelSupports

Queryable feature-support descriptor attached to a model instance.

Functions#

_has_optimized_tp_plan

Check if model_cls has an entry in PARALLELIZE_FUNCTIONS.

_is_moe

_supports_seq_lens

True when model.forward() accepts a seq_lens kwarg.

_has_backend

True for custom models that carry a BackendConfig.

_uses_te_attention

True when the model was constructed with the TE attention backend.

_is_hybrid

True when the model mixes attention with non-attention layers (e.g. Mamba/SSM).

validate_for_mesh

Validate mesh parallelism sizes against this model’s capabilities.

_supports_forwarding_property

Property that forwards model.<name> to model.supports.<name>.

_lazy_supports_property

_build_class_dict

attach_capabilities_and_validate

Attach model.supports and model.supports_* and call validate_for_mesh.

Data#

API#

nemo_automodel._transformers.capabilities.logger#

β€˜getLogger(…)’

nemo_automodel._transformers.capabilities._has_optimized_tp_plan(model_cls: type) bool[source]#

Check if model_cls has an entry in PARALLELIZE_FUNCTIONS.

nemo_automodel._transformers.capabilities._is_moe(model_cls: type) bool[source]#
nemo_automodel._transformers.capabilities._supports_seq_lens(model: torch.nn.Module) bool[source]#

True when model.forward() accepts a seq_lens kwarg.

nemo_automodel._transformers.capabilities._has_backend(model: torch.nn.Module) bool[source]#

True for custom models that carry a BackendConfig.

nemo_automodel._transformers.capabilities._uses_te_attention(model: torch.nn.Module) bool[source]#

True when the model was constructed with the TE attention backend.

nemo_automodel._transformers.capabilities._is_hybrid(model: torch.nn.Module) bool[source]#

True when the model mixes attention with non-attention layers (e.g. Mamba/SSM).

Detected via config attributes used by NemotronH (layers_block_type) and HF hybrid models (hybrid_override_pattern, is_hybrid_model).

class nemo_automodel._transformers.capabilities.ModelSupports(model: torch.nn.Module, mesh: MeshContext | None = None)[source]#

Queryable feature-support descriptor attached to a model instance.

Every property is derived from introspection of the live model so it reflects the actual class hierarchy and forward signature, not a hand-maintained table.

Usage::

model = NeMoAutoModelForCausalLM.from_pretrained(...)
model.supports.tp   # True / False
model.supports.pp   # ...

Initialization

__slots__#

(β€˜_model’, β€˜_model_cls’, β€˜_mesh’)

__repr__() str[source]#
property is_custom_model: bool#

True when the model class has a custom (non-HF) implementation in the registry.

property supports_tp: bool#

Model has an optimized or HF-native tensor-parallel plan.

property supports_pp: bool#

Model supports pipeline parallelism.

True when the model either declares a _pp_plan or inherits from MoEFSDPSyncMixin (MoE models handle PP via patched_backward_maybe_with_nosync).

property supports_tp_plan: bool#
property supports_pp_plan: bool#
property supports_cp: bool#

Model supports context parallelism.

+β€”β€”β€”β€”β€”β€”+β€”β€”β€”β€”β€”-+β€”β€”β€”+ | Model kind | Attention | CP? | +β€”β€”β€”β€”β€”β€”+β€”β€”β€”β€”β€”-+β€”β€”β€”+ | Custom | TE | Yes | | Custom | FlexAttention | No | | HF (pure attn) | SDPA | Yes | | HF (pure attn) | no SDPA | No | | HF hybrid (Mamba)| any | No | +β€”β€”β€”β€”β€”β€”+β€”β€”β€”β€”β€”-+β€”β€”β€”+

property supports_ep: bool#

Model is a Mixture-of-Experts that supports expert parallelism.

property supports_sequence_packing: bool#

forward() accepts seq_lens for packed-sequence training.

property supports_generate: bool#

Model has a generate() method for autoregressive inference.

property supports_gradient_checkpointing: bool#

Gradient checkpointing is supported.

property cp_size: int#
property tp_size: int#
property pp_size: int#
property ep_size: int#
property supports_cp_with_sequence_packing: bool#

CP + packed sequences requires TE attention backend.

nemo_automodel._transformers.capabilities.validate_for_mesh(
model: torch.nn.Module,
mesh: nemo_automodel.components.distributed.mesh.MeshContext,
) None[source]#

Validate mesh parallelism sizes against this model’s capabilities.

Works both as a bound method (model.validate_for_mesh()) and as a standalone call (validate_for_mesh(model)).

Raises :class:ValueError with one bullet per violation.

nemo_automodel._transformers.capabilities._supports_forwarding_property(name: str) property[source]#

Property that forwards model.<name> to model.supports.<name>.

nemo_automodel._transformers.capabilities._lazy_supports_property(
self: torch.nn.Module,
) nemo_automodel._transformers.capabilities.ModelSupports[source]#
nemo_automodel._transformers.capabilities._build_class_dict() dict[str, property | type][source]#
nemo_automodel._transformers.capabilities.attach_capabilities_and_validate(
model: torch.nn.Module,
mesh: nemo_automodel.components.distributed.mesh.MeshContext,
) torch.nn.Module[source]#

Attach model.supports and model.supports_* and call validate_for_mesh.

Injects a thin dynamic subclass so that property descriptors (supports_*) resolve via __getattribute__ with no __getattr__ overhead, which avoids triggering ModelCapabilitiesMixin.getattr for models that lack the attribute. Safe to call more than once – subsequent calls are no-ops.