nemo_automodel.components.loss.chunked_ce
#
Module Contents#
Classes#
Functions#
Computes the cross-entropy loss between logits and targets. |
Data#
API#
- nemo_automodel.components.loss.chunked_ce._compiled_compute_cross_entropy#
None
- nemo_automodel.components.loss.chunked_ce.compute_cross_entropy(
- logits: torch.Tensor,
- targets: torch.Tensor,
- ignore_index=-100,
Computes the cross-entropy loss between logits and targets.
- Parameters:
logits (torch.Tensor) – Model predictions of shape (sequence_length, num_classes).
targets (torch.Tensor) – Ground-truth labels of shape (sequence_length,).
ignore_index (int, optional) – Target value that is ignored when computing the loss. Defaults to -100.
- Returns:
The sum of cross-entropy losses over the sequence.
- Return type:
torch.Tensor
- class nemo_automodel.components.loss.chunked_ce.ChunkedCrossEntropy(
- chunk_len: int = 32,
- compile: bool = True,
- ignore_index: int = -100,
Initialization
Chunked cross-entropy loss.
- Parameters:
chunk_len (int, optional) – The size of each chunk. The sequence will be split along the first dimension in chunks of this length. Defaults to 32.
compile (bool, optional) – If True, uses the compiled compute_cross_entropy function. Defaults to True.
ignore_index (int, optional) – Target value that is ignored when computing the loss. Defaults to -100.
- __call__(
- logits: torch.Tensor,
- labels: torch.Tensor,
- mask: Optional[torch.Tensor] = None,
Computes cross-entropy loss in chunks to handle long sequences more efficiently.
- Parameters:
logits (torch.Tensor) – Model output logits of shape [batch_size, seq_len, vocab_size].
labels (torch.Tensor) – Ground-truth labels of shape [batch_size, seq_len].
mask (torch.Tensor, optional) – Boolean mask indicating valid positions (1) and positions to ignore (0). Defaults to None.
- Returns:
The sum of cross-entropy losses over the sequence.
- Return type:
torch.Tensor