nemo_automodel._transformers.infrastructure#
Infrastructure instantiation and application.
Distributed manager instantiation, sharding, PEFT/quantization application,
and checkpoint loading utilities. These free functions operate on an
already-instantiated nn.Module and have no coupling to the
_BaseNeMoAutoModelClass hierarchy.
MeshContext (from mesh) is the single source of truth
for device meshes, parallelism sizes, and axis names.
Module Contents#
Functions#
Apply EP + FSDP sharding (non-PP path). |
|
Instantiate the appropriate distributed manager from config. |
|
Instantiate AutoPipeline from config. |
|
Parallelize model for pipeline parallelism (non-MoE case). |
|
Instantiate infrastructure objects from config classes. |
|
Apply sharding, PEFT, quantization, and checkpoint loading to a model. |
Data#
API#
- nemo_automodel._transformers.infrastructure.logger#
βgetLogger(β¦)β
- nemo_automodel._transformers.infrastructure._apply_peft_and_lower_precision(
- model,
- tp_size,
- autopipeline,
- peft_config,
- quantization_config,
- fp8_config,
- qat_quantizer,
- nemo_automodel._transformers.infrastructure._shard_pp(autopipeline, model, loss_fn, parallelize_fn)#
- nemo_automodel._transformers.infrastructure._shard_ep_fsdp(
- model,
- model_wrapper,
- parallelize_fn,
- mesh: nemo_automodel.components.distributed.mesh.MeshContext,
Apply EP + FSDP sharding (non-PP path).
- nemo_automodel._transformers.infrastructure._instantiate_distributed(
- config: nemo_automodel.components.distributed.config.DistributedConfig,
- mesh: nemo_automodel.components.distributed.mesh.MeshContext,
Instantiate the appropriate distributed manager from config.
- Parameters:
config β Distributed config (FSDP2Config, MegatronFSDPConfig, or DDPConfig).
mesh β MeshContext holding device_mesh and moe_mesh references.
- Returns:
The instantiated manager, or None if config is None.
- Raises:
ValueError β If device_mesh is required but not provided.
- nemo_automodel._transformers.infrastructure._instantiate_pipeline(
- config: Optional[nemo_automodel.components.distributed.pipelining.config.PipelineConfig],
- mesh: nemo_automodel.components.distributed.mesh.MeshContext,
- device: Optional[torch.device] = None,
Instantiate AutoPipeline from config.
- Parameters:
config β Pipeline config. If None or pp_size <= 1, returns None.
mesh β MeshContext holding device_mesh, moe_mesh, and axis names.
device β Target device for pipeline computation.
- Returns:
AutoPipeline instance, or None if pipeline parallelism is not enabled.
- nemo_automodel._transformers.infrastructure._instantiate_qat(
- config: Optional[nemo_automodel.components.quantization.qat.QATConfig],
- nemo_automodel._transformers.infrastructure.parallelize_for_pp(
- model: torch.nn.Module,
- *,
- model_wrapper: Optional[Union[nemo_automodel.components.distributed.fsdp2.FSDP2Manager, nemo_automodel.components.distributed.megatron_fsdp.MegatronFSDPManager, nemo_automodel.components.distributed.ddp.DDPManager]] = None,
- **kwargs,
Parallelize model for pipeline parallelism (non-MoE case).
This function adapts the pipeline parallelism interface to use model_wrapper.parallelize(). For MoE models, use parallelize_model from nemo_automodel.components.moe.parallelizer directly.
- Parameters:
model β The model to parallelize.
model_wrapper β Distributed manager instance.
**kwargs β Additional arguments (world_mesh, moe_mesh, axis names) passed by AutoPipeline but unused for non-MoE parallelization.
- Returns:
The parallelized model.
- nemo_automodel._transformers.infrastructure.instantiate_infrastructure(
- *,
- distributed_config: Optional[nemo_automodel.components.distributed.config.DistributedConfig] = None,
- pipeline_config: Optional[nemo_automodel.components.distributed.pipelining.config.PipelineConfig] = None,
- qat_config: Optional[nemo_automodel.components.quantization.qat.QATConfig] = None,
- moe_config: Optional[nemo_automodel.components.moe.config.MoEParallelizerConfig] = None,
- activation_checkpointing: bool = False,
- device: Optional[torch.device] = None,
- mesh: Optional[nemo_automodel.components.distributed.mesh.MeshContext] = None,
- device_mesh: Optional[torch.distributed.device_mesh.DeviceMesh] = None,
- moe_mesh: Optional[torch.distributed.device_mesh.DeviceMesh] = None,
- ep_size: int = 1,
Instantiate infrastructure objects from config classes.
This function converts config objects into the runtime objects needed by apply_model_infrastructure. It provides a cleaner, more HuggingFace-like API where users pass config objects instead of constructing runtime objects directly.
- Parameters:
distributed_config β Distributed training config (FSDP2Config, MegatronFSDPConfig, or DDPConfig).
pipeline_config β Pipeline parallelism config.
qat_config β Quantization-aware training config.
moe_config β MoE parallelizer config (for expert parallel models).
activation_checkpointing β Enable activation checkpointing for transformer blocks. Defaults to False.
device β Target device for model.
mesh β MeshContext holding device meshes, sizes, and axis names. If None, built from the legacy
device_mesh/moe_meshparams.device_mesh β (deprecated) Device mesh for distributed operations.
moe_mesh β (deprecated) Optional MOE mesh for expert parallelism.
ep_size β (deprecated) Expert parallelism size. Ignored when
meshis provided.
- Returns:
(model_wrapper, autopipeline, parallelize_fn, qat_quantizer) - model_wrapper: Distributed manager instance (or None) - autopipeline: AutoPipeline instance (or None) - parallelize_fn: Parallelization function (or None) - built for EP (MoE-specific parallelizer when ep_size > 1) or PP (via model_wrapper) - qat_quantizer: QAT quantizer instance (or None)
- Return type:
tuple
- nemo_automodel._transformers.infrastructure.apply_model_infrastructure(
- model,
- *,
- is_meta_device,
- device,
- model_wrapper=None,
- mesh=None,
- peft_config=None,
- quantization_config=None,
- fp8_config=None,
- qat_quantizer=None,
- loss_fn=None,
- autopipeline=None,
- parallelize_fn=None,
- compile_config=None,
- load_base_model=False,
- cache_dir=None,
- pretrained_model_name_or_path='',
- **_kwargs,
Apply sharding, PEFT, quantization, and checkpoint loading to a model.
This function contains the common post-init logic shared between from_pretrained and from_config methods. It can also be called directly for models built via custom builder functions (e.g., build_gpt2_model). It handles:
PEFT and lower precision application (LoRA, FP8, QAT)
Loss function setup
Pipeline parallelism or EP/FSDP sharding
Device placement and compilation
Checkpoint loading for meta device models
- Parameters:
model β The model to apply infrastructure to
is_meta_device β Whether model was initialized on meta device
device β Target device for model
model_wrapper β Model wrapper (FSDP2Manager, DDPManager, etc.). Default: None
mesh β MeshContext with parallelism sizes (tp_size, cp_size, etc.) and mesh references. Default: None (treated as single-GPU defaults).
peft_config β PEFT/LoRA configuration dict. Default: None
quantization_config β Quantization configuration. Default: None
fp8_config β FP8 configuration. Default: None
qat_quantizer β QAT quantizer instance. Default: None
loss_fn β Loss function (may be replaced with MaskedCrossEntropy). Default: None
autopipeline β AutoPipeline instance for pipeline parallelism. Default: None
parallelize_fn β Function to apply parallelization (EP + FSDP2). Default: None
compile_config β Compilation configuration. Default: None
pretrained_model_name_or_path β Model name or path for checkpoint loading. Default: ββ
load_base_model β Whether to load base model weights (True for from_pretrained). Default: False
cache_dir β Cache directory for model weights. Default: None
**_kwargs β Additional keyword arguments (ignored, allows passing extra kwargs)
- Returns:
The model with all infrastructure applied