nemo_automodel.components.distributed.parallel_styles

Module Contents

Classes

Name	Description
`ColwiseParallelLora`	Column-wise tensor parallel style for LoRA-aware modules.
`RowwiseParallelLora`	Row-wise tensor parallel style for LoRA-aware modules.
`SequenceParallelLora`	Sequence parallel style that replicates LoRA module parameters.
`TPLinear`	nn.Linear variant safe for torch.compile + DTensor tensor-parallel weights.

Functions

Name	Description
`_distribute_param`	-
`translate_to_lora`	Mutate a tensor-parallel plan to the matching LoRA-aware style.

API

class nemo_automodel.components.distributed.parallel_styles.ColwiseParallelLora()

Bases: ColwiseParallel

Column-wise tensor parallel style for LoRA-aware modules.

nemo_automodel.components.distributed.parallel_styles.ColwiseParallelLora._partition_embedding_fn(
    name,
    module,
    device_mesh
)

nemo_automodel.components.distributed.parallel_styles.ColwiseParallelLora._partition_linear_fn(
    name,
    module,
    device_mesh
)

class nemo_automodel.components.distributed.parallel_styles.RowwiseParallelLora()

Bases: RowwiseParallel

Row-wise tensor parallel style for LoRA-aware modules.

nemo_automodel.components.distributed.parallel_styles.RowwiseParallelLora._partition_embedding_fn(
    name,
    module,
    device_mesh
)

nemo_automodel.components.distributed.parallel_styles.RowwiseParallelLora._partition_linear_fn(
    name,
    module,
    device_mesh
)

class nemo_automodel.components.distributed.parallel_styles.SequenceParallelLora()

Bases: SequenceParallel

Sequence parallel style that replicates LoRA module parameters.

nemo_automodel.components.distributed.parallel_styles.SequenceParallelLora._replicate_module_fn(
    name: str,
    module: torch.nn.Module,
    device_mesh: torch.distributed.tensor.DeviceMesh
)

class nemo_automodel.components.distributed.parallel_styles.TPLinear()

Bases: Linear

nn.Linear variant safe for torch.compile + DTensor tensor-parallel weights.

F.linear decomposes to aten.view + aten.mm + aten.view for 3-D input. In AOT-autograd backward tracing the view on a sharded DTensor activation hits DTensor’s slow-path sharding propagation (no explicit rule for aten.view that changes the shard-dim index), which recurses infinitely.

torch.bmm is a native 3-D op whose backward is also bmm — no view is ever emitted. DTensor has explicit strategies for bmm covering the ColwiseParallel (Replicate x Shard(2) -> Shard(2)) and RowwiseParallel (Shard(2) x Shard(1) -> Partial) patterns.

Note: expand(b, -1, -1) dispatches through DTensor’s ShardingPropagator which caches via lru_cache keyed on DTensorSpec. With dynamic shapes, b = x.shape[0] is a SymInt, making DTensorSpec._hash_impl raise TypeError. This is handled by _patch_dtensor_spec_hash_for_symint() in parallelizer.py which falls back to a placement-only hash for SymInt shapes.

Usage: after TP weight sharding, convert an nn.Linear instance by setting linear.__class__ = TPLinear. This is the same class-swap trick used by translate_to_lora, and ensures torch.compile/dynamo sees the correct type(module).forward rather than nn.Linear.forward.

nemo_automodel.components.distributed.parallel_styles.TPLinear.forward(
    x
)

nemo_automodel.components.distributed.parallel_styles._distribute_param(
    _module,
    name,
    device_mesh,
    src_data_rank,
    placements
)

nemo_automodel.components.distributed.parallel_styles.translate_to_lora(
    plan
)

Mutate a tensor-parallel plan to the matching LoRA-aware style.