> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.distributed.parallel_styles

## Module Contents

### Classes

| Name                                                                                                  | Description                                                                 |
| ----------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------- |
| [`ColwiseParallelLora`](#nemo_automodel-components-distributed-parallel_styles-ColwiseParallelLora)   | Column-wise tensor parallel style for LoRA-aware modules.                   |
| [`RowwiseParallelLora`](#nemo_automodel-components-distributed-parallel_styles-RowwiseParallelLora)   | Row-wise tensor parallel style for LoRA-aware modules.                      |
| [`SequenceParallelLora`](#nemo_automodel-components-distributed-parallel_styles-SequenceParallelLora) | Sequence parallel style that replicates LoRA module parameters.             |
| [`TPLinear`](#nemo_automodel-components-distributed-parallel_styles-TPLinear)                         | nn.Linear variant safe for torch.compile + DTensor tensor-parallel weights. |

### Functions

| Name                                                                                            | Description                                                     |
| ----------------------------------------------------------------------------------------------- | --------------------------------------------------------------- |
| [`_distribute_param`](#nemo_automodel-components-distributed-parallel_styles-_distribute_param) | -                                                               |
| [`translate_to_lora`](#nemo_automodel-components-distributed-parallel_styles-translate_to_lora) | Mutate a tensor-parallel plan to the matching LoRA-aware style. |

### API

```python
class nemo_automodel.components.distributed.parallel_styles.ColwiseParallelLora()
```

**Bases:** `ColwiseParallel`

Column-wise tensor parallel style for LoRA-aware modules.

```python
nemo_automodel.components.distributed.parallel_styles.ColwiseParallelLora._partition_embedding_fn(
    name,
    module,
    device_mesh
)
```

```python
nemo_automodel.components.distributed.parallel_styles.ColwiseParallelLora._partition_linear_fn(
    name,
    module,
    device_mesh
)
```

```python
class nemo_automodel.components.distributed.parallel_styles.RowwiseParallelLora()
```

**Bases:** `RowwiseParallel`

Row-wise tensor parallel style for LoRA-aware modules.

```python
nemo_automodel.components.distributed.parallel_styles.RowwiseParallelLora._partition_embedding_fn(
    name,
    module,
    device_mesh
)
```

```python
nemo_automodel.components.distributed.parallel_styles.RowwiseParallelLora._partition_linear_fn(
    name,
    module,
    device_mesh
)
```

```python
class nemo_automodel.components.distributed.parallel_styles.SequenceParallelLora()
```

**Bases:** `SequenceParallel`

Sequence parallel style that replicates LoRA module parameters.

```python
nemo_automodel.components.distributed.parallel_styles.SequenceParallelLora._replicate_module_fn(
    name: str,
    module: torch.nn.Module,
    device_mesh: torch.distributed.tensor.DeviceMesh
)
```

```python
class nemo_automodel.components.distributed.parallel_styles.TPLinear()
```

**Bases:** `Linear`

nn.Linear variant safe for torch.compile + DTensor tensor-parallel weights.

F.linear decomposes to aten.view + aten.mm + aten.view for 3-D input.  In
AOT-autograd backward tracing the view on a sharded DTensor activation hits
DTensor's slow-path sharding propagation (no explicit rule for aten.view that
changes the shard-dim index), which recurses infinitely.

torch.bmm is a native 3-D op whose backward is also bmm -- no view is ever
emitted.  DTensor has explicit strategies for bmm covering the ColwiseParallel
(Replicate x Shard(2) -> Shard(2)) and RowwiseParallel (Shard(2) x Shard(1) ->
Partial) patterns.

Note: expand(b, -1, -1) dispatches through DTensor's ShardingPropagator which
caches via lru\_cache keyed on DTensorSpec.  With dynamic shapes, b = x.shape\[0]
is a SymInt, making DTensorSpec.\_hash\_impl raise TypeError.  This is handled by
\_patch\_dtensor\_spec\_hash\_for\_symint() in parallelizer.py which falls back to a
placement-only hash for SymInt shapes.

Usage: after TP weight sharding, convert an nn.Linear instance by setting
`linear.__class__ = TPLinear`.  This is the same **class**-swap trick used
by translate\_to\_lora, and ensures torch.compile/dynamo sees the correct
type(module).forward rather than nn.Linear.forward.

```python
nemo_automodel.components.distributed.parallel_styles.TPLinear.forward(
    x
)
```

```python
nemo_automodel.components.distributed.parallel_styles._distribute_param(
    _module,
    name,
    device_mesh,
    src_data_rank,
    placements
)
```

```python
nemo_automodel.components.distributed.parallel_styles.translate_to_lora(
    plan
)
```

Mutate a tensor-parallel plan to the matching LoRA-aware style.