nemo_automodel.components.distributed.tensor_utils
nemo_automodel.components.distributed.tensor_utils
Tensor utilities for device transfers and memory management in distributed settings.
This module provides utilities for handling tensor operations across different devices and distributed tensor types, with optimizations for performance in distributed training scenarios.
Module Contents
Functions
API
Copy the state dict generator to CPU memory.
Parameters:
An iterable that yields (key, tensor) pairs from a model state.
Whether to allocate the CPU tensors in pinned memory for faster GPU transfer. Defaults to False.
Returns: dict[str, torch.Tensor]
dict[str, torch.Tensor]: A dictionary mapping parameter names to CPU tensors.
Move a tensor or distributed tensor to the CPU.
This function takes an input tensor, which can be either a DTensor (distributed tensor)
or a standard Tensor, and ensures that it is moved to the CPU.
Parameters:
The input value, which can be a DTensor, Tensor, or
any other object. If DTensor, it checks the device and
moves the tensor accordingly.
Returns:
Tensor | any: The corresponding CPU tensor if v is a DTensor or Tensor,
otherwise returns v unchanged.
Raises:
ValueError: Ifvis aDTensorbut its device is neither ‘cuda’ nor ‘cpu’.
Returns the local shard of the given tensor if it is a DTensor.
Taken and modified from: https://github.com/NVIDIA/Megatron-LM/blob/605f618f237cda8fa80132bc2ccff933512d5a0d/megatron/core/utils.py#L746