> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.distributed.tensor_utils

Tensor utilities for device transfers and memory management in distributed settings.

This module provides utilities for handling tensor operations across different devices
and distributed tensor types, with optimizations for performance in distributed training scenarios.

## Module Contents

### Functions

| Name                                                                                             | Description                                                     |
| ------------------------------------------------------------------------------------------------ | --------------------------------------------------------------- |
| [`get_cpu_state_dict`](#nemo_automodel-components-distributed-tensor_utils-get_cpu_state_dict)   | Copy the state dict generator to CPU memory.                    |
| [`to_cpu`](#nemo_automodel-components-distributed-tensor_utils-to_cpu)                           | Move a tensor or distributed tensor to the CPU.                 |
| [`to_local_if_dtensor`](#nemo_automodel-components-distributed-tensor_utils-to_local_if_dtensor) | Returns the local shard of the given tensor if it is a DTensor. |

### API

```python
nemo_automodel.components.distributed.tensor_utils.get_cpu_state_dict(
    state_generator: typing.Iterable[tuple[str, typing.Union[torch.Tensor, torch.distributed.tensor.DTensor]]],
    pin_memory: bool = False
) -> dict[str, torch.Tensor]
```

Copy the state dict generator to CPU memory.

**Parameters:**

An iterable that yields (key, tensor) pairs from a model state.

Whether to allocate the CPU tensors in pinned memory for faster GPU transfer.
Defaults to False.

**Returns:** `dict[str, torch.Tensor]`

dict\[str, torch.Tensor]: A dictionary mapping parameter names to CPU tensors.

```python
nemo_automodel.components.distributed.tensor_utils.to_cpu(
    v
)
```

Move a tensor or distributed tensor to the CPU.

This function takes an input tensor, which can be either a `DTensor` (distributed tensor)
or a standard `Tensor`, and ensures that it is moved to the CPU.

**Parameters:**

The input value, which can be a `DTensor`, `Tensor`, or
any other object. If `DTensor`, it checks the device and
moves the tensor accordingly.

**Returns:**

Tensor | any: The corresponding CPU tensor if `v` is a `DTensor` or `Tensor`,
otherwise returns `v` unchanged.

**Raises:**

* `ValueError`: If `v` is a `DTensor` but its device is neither 'cuda' nor 'cpu'.

```python
nemo_automodel.components.distributed.tensor_utils.to_local_if_dtensor(
    tensor: typing.Union[torch.Tensor, torch.distributed.tensor.DTensor]
) -> torch.Tensor
```

Returns the local shard of the given tensor if it is a DTensor.

Taken and modified from: [https://github.com/NVIDIA/Megatron-LM/blob/605f618f237cda8fa80132bc2ccff933512d5a0d/megatron/core/utils.py#L746](https://github.com/NVIDIA/Megatron-LM/blob/605f618f237cda8fa80132bc2ccff933512d5a0d/megatron/core/utils.py#L746)