nemo_automodel.components.distributed.grad_utils
nemo_automodel.components.distributed.grad_utils
Module Contents
Functions
API
Clips gradient of an iterable of parameters by total norm.
Taken and modified from: https://github.com/NVIDIA/Megatron-LM/blob/a695b2bd2a0ca9ca63385a48c41a1c5a033cdd1e/megatron/core/optimizer/clip_grads.py#L138
Note that the gradients are modified in place.
Parameters:
An iterable of Tensors or DTensors, or a single Tensor or DTensor that will have gradients normalized.
Maximum norm of the gradients.
The pre-computed total norm of the gradients to use for scaling.
Calculate the norm of gradients.
Taken and modified from: https://github.com/NVIDIA/Megatron-LM/blob/a695b2bd2a0ca9ca63385a48c41a1c5a033cdd1e/megatron/core/optimizer/clip_grads.py#L51
Parameters:
An iterable of Tensors or DTensors, or a single Tensor or DTensor that will have gradient norm calculated.
Process group for data parallel communication.
Process group for context parallel communication.
Process group for tensor parallel communication.
Type of the used p-norm. Can be 'inf' for
infinity norm.
Returns: float
Total norm of the gradients (viewed as a single vector)