nemo_automodel.components.loss.chunked_ce#

Module Contents#

Classes#

Functions#

compute_cross_entropy

Computes the cross-entropy loss between logits and targets.

Data#

API#

nemo_automodel.components.loss.chunked_ce._compiled_compute_cross_entropy#

None

nemo_automodel.components.loss.chunked_ce.compute_cross_entropy(
logits: torch.Tensor,
targets: torch.Tensor,
ignore_index=-100,
)[source]#

Computes the cross-entropy loss between logits and targets.

Parameters:
  • logits (torch.Tensor) – Model predictions of shape (sequence_length, num_classes).

  • targets (torch.Tensor) – Ground-truth labels of shape (sequence_length,).

  • ignore_index (int, optional) – Target value that is ignored when computing the loss. Defaults to -100.

Returns:

The sum of cross-entropy losses over the sequence.

Return type:

torch.Tensor

class nemo_automodel.components.loss.chunked_ce.ChunkedCrossEntropy(
chunk_len: int = 32,
compile: bool = True,
ignore_index: int = -100,
)[source]#

Initialization

Chunked cross-entropy loss.

Parameters:
  • chunk_len (int, optional) – The size of each chunk. The sequence will be split along the first dimension in chunks of this length. Defaults to 32.

  • compile (bool, optional) – If True, uses the compiled compute_cross_entropy function. Defaults to True.

  • ignore_index (int, optional) – Target value that is ignored when computing the loss. Defaults to -100.

__call__(
logits: torch.Tensor,
labels: torch.Tensor,
mask: Optional[torch.Tensor] = None,
) torch.Tensor[source]#

Computes cross-entropy loss in chunks to handle long sequences more efficiently.

Parameters:
  • logits (torch.Tensor) – Model output logits of shape [batch_size, seq_len, vocab_size].

  • labels (torch.Tensor) – Ground-truth labels of shape [batch_size, seq_len].

  • mask (torch.Tensor, optional) – Boolean mask indicating valid positions (1) and positions to ignore (0). Defaults to None.

Returns:

The sum of cross-entropy losses over the sequence.

Return type:

torch.Tensor