> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.distributed.grad_utils

## Module Contents

### Functions

| Name                                                                                                     | Description                                                |
| -------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------- |
| [`clip_grad_by_total_norm_`](#nemo_automodel-components-distributed-grad_utils-clip_grad_by_total_norm_) | Clips gradient of an iterable of parameters by total norm. |
| [`get_grad_norm`](#nemo_automodel-components-distributed-grad_utils-get_grad_norm)                       | Calculate the norm of gradients.                           |

### API

```python
nemo_automodel.components.distributed.grad_utils.clip_grad_by_total_norm_(
    parameters: typing.Union[list[typing.Union[torch.Tensor, torch.distributed.tensor.DTensor]], typing.Union[torch.Tensor, torch.distributed.tensor.DTensor]],
    max_grad_norm: typing.Union[int, float],
    total_norm: float,
    dtype: torch.dtype = torch.float32
)
```

Clips gradient of an iterable of parameters by total norm.

Taken and modified from: [https://github.com/NVIDIA/Megatron-LM/blob/a695b2bd2a0ca9ca63385a48c41a1c5a033cdd1e/megatron/core/optimizer/clip\_grads.py#L138](https://github.com/NVIDIA/Megatron-LM/blob/a695b2bd2a0ca9ca63385a48c41a1c5a033cdd1e/megatron/core/optimizer/clip_grads.py#L138)

Note that the gradients are modified in place.

**Parameters:**

An iterable of Tensors or DTensors, or a single Tensor or DTensor
that will have gradients normalized.

Maximum norm of the gradients.

The pre-computed total norm of the gradients to use for scaling.

```python
nemo_automodel.components.distributed.grad_utils.get_grad_norm(
    parameters: typing.Union[list[typing.Union[torch.Tensor, torch.distributed.tensor.DTensor]], typing.Union[torch.Tensor, torch.distributed.tensor.DTensor]],
    dp_cp_group: torch.distributed.ProcessGroup,
    tp_group: torch.distributed.ProcessGroup,
    norm_type: typing.Union[int, float] = 2,
    dtype: torch.dtype = torch.float32
) -> float
```

Calculate the norm of gradients.

Taken and modified from: [https://github.com/NVIDIA/Megatron-LM/blob/a695b2bd2a0ca9ca63385a48c41a1c5a033cdd1e/megatron/core/optimizer/clip\_grads.py#L51](https://github.com/NVIDIA/Megatron-LM/blob/a695b2bd2a0ca9ca63385a48c41a1c5a033cdd1e/megatron/core/optimizer/clip_grads.py#L51)

**Parameters:**

An iterable of Tensors or DTensors, or a single Tensor or DTensor
that will have gradient norm calculated.

Process group for data parallel communication.

Process group for context parallel communication.

Process group for tensor parallel communication.

Type of the used p-norm. Can be `'inf'` for
infinity norm.

**Returns:** `float`

Total norm of the gradients (viewed as a single vector)