nemo_automodel.components.distributed.tensor_utils#

Tensor utilities for device transfers and memory management in distributed settings.

This module provides utilities for handling tensor operations across different devices and distributed tensor types, with optimizations for performance in distributed training scenarios.

Module Contents#

Functions#

get_cpu_state_dict

Copy the state dict generator to CPU memory.

to_cpu

Move a tensor or distributed tensor to the CPU.

to_local_if_dtensor

Returns the local shard of the given tensor if it is a DTensor.

API#

nemo_automodel.components.distributed.tensor_utils.get_cpu_state_dict(
state_generator: Iterable[tuple[str, Union[torch.Tensor, torch.distributed.tensor.DTensor]]],
pin_memory: bool = False,
) dict[str, torch.Tensor][source]#

Copy the state dict generator to CPU memory.

Parameters:
  • state_generator (Iterable[tuple[str, Union[torch.Tensor, DTensor]]]) – An iterable that yields (key, tensor) pairs from a model state.

  • pin_memory (bool, optional) – Whether to allocate the CPU tensors in pinned memory for faster GPU transfer. Defaults to False.

Returns:

A dictionary mapping parameter names to CPU tensors.

Return type:

dict[str, torch.Tensor]

nemo_automodel.components.distributed.tensor_utils.to_cpu(v)[source]#

Move a tensor or distributed tensor to the CPU.

This function takes an input tensor, which can be either a DTensor (distributed tensor) or a standard Tensor, and ensures that it is moved to the CPU.

Parameters:

v (DTensor | Tensor | any) – The input value, which can be a DTensor, Tensor, or any other object. If DTensor, it checks the device and moves the tensor accordingly.

Returns:

The corresponding CPU tensor if v is a DTensor or Tensor, otherwise returns v unchanged.

Return type:

Tensor | any

Raises:

ValueError – If v is a DTensor but its device is neither ‘cuda’ nor ‘cpu’.

.. rubric:: Example

t = torch.tensor([1, 2, 3], device=’cuda’) to_cpu(t) # Moves tensor to CPU tensor([1, 2, 3])

dt = DTensor(torch.tensor([4, 5, 6], device=’cuda’)) to_cpu(dt) # Moves DTensor to CPU tensor([4, 5, 6])

nemo_automodel.components.distributed.tensor_utils.to_local_if_dtensor(
tensor: Union[torch.Tensor, torch.distributed.tensor.DTensor],
) torch.Tensor[source]#

Returns the local shard of the given tensor if it is a DTensor.

Taken and modified from: https://github.com/NVIDIA/Megatron-LM/blob/605f618f237cda8fa80132bc2ccff933512d5a0d/megatron/core/utils.py#L746