nemo_automodel.loss.chunked_ce#

Module Contents#

Functions#

compute_cross_entropy

Computes the cross-entropy loss between logits and targets.

chunked_cross_entropy

Computes cross-entropy loss in chunks to handle long sequences more efficiently.

Data#

API#

nemo_automodel.loss.chunked_ce._compiled_compute_cross_entropy#

None

nemo_automodel.loss.chunked_ce.compute_cross_entropy(
logits: torch.Tensor,
targets: torch.Tensor,
ignore_index=-100,
)[source]#

Computes the cross-entropy loss between logits and targets.

Parameters:
  • logits (torch.Tensor) – Model predictions of shape (sequence_length, num_classes).

  • targets (torch.Tensor) – Ground-truth labels of shape (sequence_length,).

  • ignore_index (int, optional) – Target value that is ignored when computing the loss. Defaults to -100.

Returns:

The sum of cross-entropy losses over the sequence.

Return type:

torch.Tensor

nemo_automodel.loss.chunked_ce.chunked_cross_entropy(
logits,
targets,
mask=None,
chunk_len=32,
compile=True,
ignore_index=-100,
)[source]#

Computes cross-entropy loss in chunks to handle long sequences more efficiently.

Parameters:
  • logits (torch.Tensor) – Model output logits of shape (sequence_length, num_classes).

  • targets (torch.Tensor) – Ground-truth labels of shape (sequence_length,).

  • mask (torch.Tensor, optional) – Boolean mask indicating valid positions (1) and positions to ignore (0). Defaults to None.

  • chunk_len (int, optional) – The size of each chunk. The sequence will be split along the first dimension in chunks of this length. Defaults to 32.

  • compile (bool, optional) – If True, uses the compiled compute_cross_entropy function. Defaults to True.

  • ignore_index (int, optional) – Target value that is ignored when computing the loss. Defaults to -100.

Returns:

The average cross-entropy loss across the valid tokens in the sequence.

Return type:

torch.Tensor