nemo_automodel._transformers.capabilities#
Model capabilities introspection and input validation.
Provides :class:ModelSupports (a read-only descriptor of what a model can
do) and :func:attach_capabilities_and_validate which attaches
model.supports, model.supports_*, and model.validate_for_mesh
to any nn.Module.
Capabilities are derived from code introspection β class attributes, mixin inheritance, forward-signature inspection β so they stay in sync as models evolve without manual feature tables.
Module Contents#
Classes#
Queryable feature-support descriptor attached to a model instance. |
Functions#
Check if model_cls has an entry in |
|
True when |
|
True for custom models that carry a |
|
True when the model was constructed with the TE attention backend. |
|
True when the model mixes attention with non-attention layers (e.g. Mamba/SSM). |
|
Validate mesh parallelism sizes against this modelβs capabilities. |
|
Property that forwards |
|
Attach |
Data#
API#
- nemo_automodel._transformers.capabilities.logger#
βgetLogger(β¦)β
- nemo_automodel._transformers.capabilities._has_optimized_tp_plan(model_cls: type) bool[source]#
Check if model_cls has an entry in
PARALLELIZE_FUNCTIONS.
- nemo_automodel._transformers.capabilities._supports_seq_lens(model: torch.nn.Module) bool[source]#
True when
model.forward()accepts aseq_lenskwarg.
- nemo_automodel._transformers.capabilities._has_backend(model: torch.nn.Module) bool[source]#
True for custom models that carry a
BackendConfig.
- nemo_automodel._transformers.capabilities._uses_te_attention(model: torch.nn.Module) bool[source]#
True when the model was constructed with the TE attention backend.
- nemo_automodel._transformers.capabilities._is_hybrid(model: torch.nn.Module) bool[source]#
True when the model mixes attention with non-attention layers (e.g. Mamba/SSM).
Detected via config attributes used by NemotronH (
layers_block_type) and HF hybrid models (hybrid_override_pattern,is_hybrid_model).
- class nemo_automodel._transformers.capabilities.ModelSupports(model: torch.nn.Module, mesh: MeshContext | None = None)[source]#
Queryable feature-support descriptor attached to a model instance.
Every property is derived from introspection of the live model so it reflects the actual class hierarchy and forward signature, not a hand-maintained table.
Usage::
model = NeMoAutoModelForCausalLM.from_pretrained(...) model.supports.tp # True / False model.supports.pp # ...
Initialization
- __slots__#
(β_modelβ, β_model_clsβ, β_meshβ)
- property is_custom_model: bool#
True when the model class has a custom (non-HF) implementation in the registry.
- property supports_tp: bool#
Model has an optimized or HF-native tensor-parallel plan.
- property supports_pp: bool#
Model supports pipeline parallelism.
True when the model either declares a
_pp_planor inherits fromMoEFSDPSyncMixin(MoE models handle PP viapatched_backward_maybe_with_nosync).
- property supports_tp_plan: bool#
- property supports_pp_plan: bool#
- property supports_cp: bool#
Model supports context parallelism.
+ββββββ+βββββ-+βββ+ | Model kind | Attention | CP? | +ββββββ+βββββ-+βββ+ | Custom | TE | Yes | | Custom | FlexAttention | No | | HF (pure attn) | SDPA | Yes | | HF (pure attn) | no SDPA | No | | HF hybrid (Mamba)| any | No | +ββββββ+βββββ-+βββ+
- property supports_ep: bool#
Model is a Mixture-of-Experts that supports expert parallelism.
- property supports_sequence_packing: bool#
forward()acceptsseq_lensfor packed-sequence training.
- property supports_generate: bool#
Model has a
generate()method for autoregressive inference.
- property supports_gradient_checkpointing: bool#
Gradient checkpointing is supported.
- property cp_size: int#
- property tp_size: int#
- property pp_size: int#
- property ep_size: int#
- property supports_cp_with_sequence_packing: bool#
CP + packed sequences requires TE attention backend.
- nemo_automodel._transformers.capabilities.validate_for_mesh(
- model: torch.nn.Module,
- mesh: nemo_automodel.components.distributed.mesh.MeshContext,
Validate mesh parallelism sizes against this modelβs capabilities.
Works both as a bound method (
model.validate_for_mesh()) and as a standalone call (validate_for_mesh(model)).Raises :class:
ValueErrorwith one bullet per violation.
- nemo_automodel._transformers.capabilities._supports_forwarding_property(name: str) property[source]#
Property that forwards
model.<name>tomodel.supports.<name>.
- nemo_automodel._transformers.capabilities._lazy_supports_property(
- self: torch.nn.Module,
- nemo_automodel._transformers.capabilities.attach_capabilities_and_validate(
- model: torch.nn.Module,
- mesh: nemo_automodel.components.distributed.mesh.MeshContext,
Attach
model.supportsandmodel.supports_*and call validate_for_mesh.Injects a thin dynamic subclass so that property descriptors (supports_*) resolve via
__getattribute__with no__getattr__overhead, which avoids triggering ModelCapabilitiesMixin.getattr for models that lack the attribute. Safe to call more than once β subsequent calls are no-ops.