nemo_automodel.loss.chunked_ce
#
Module Contents#
Functions#
Computes the cross-entropy loss between logits and targets. |
|
Computes cross-entropy loss in chunks to handle long sequences more efficiently. |
Data#
API#
- nemo_automodel.loss.chunked_ce._compiled_compute_cross_entropy#
None
- nemo_automodel.loss.chunked_ce.compute_cross_entropy(
- logits: torch.Tensor,
- targets: torch.Tensor,
- ignore_index=-100,
Computes the cross-entropy loss between logits and targets.
- Parameters:
logits (torch.Tensor) – Model predictions of shape (sequence_length, num_classes).
targets (torch.Tensor) – Ground-truth labels of shape (sequence_length,).
ignore_index (int, optional) – Target value that is ignored when computing the loss. Defaults to -100.
- Returns:
The sum of cross-entropy losses over the sequence.
- Return type:
torch.Tensor
- nemo_automodel.loss.chunked_ce.chunked_cross_entropy(
- logits,
- targets,
- mask=None,
- chunk_len=32,
- compile=True,
- ignore_index=-100,
Computes cross-entropy loss in chunks to handle long sequences more efficiently.
- Parameters:
logits (torch.Tensor) – Model output logits of shape (sequence_length, num_classes).
targets (torch.Tensor) – Ground-truth labels of shape (sequence_length,).
mask (torch.Tensor, optional) – Boolean mask indicating valid positions (1) and positions to ignore (0). Defaults to None.
chunk_len (int, optional) – The size of each chunk. The sequence will be split along the first dimension in chunks of this length. Defaults to 32.
compile (bool, optional) – If True, uses the compiled compute_cross_entropy function. Defaults to True.
ignore_index (int, optional) – Target value that is ignored when computing the loss. Defaults to -100.
- Returns:
The average cross-entropy loss across the valid tokens in the sequence.
- Return type:
torch.Tensor