nemo_automodel.components.models.nemotron_parse.nemotron_parse_loss
nemo_automodel.components.models.nemotron_parse.nemotron_parse_loss
Module Contents
Classes
API
Bases: Module
Cross-entropy loss with coordinate token weighting for NemotronParse.
This loss function computes cross-entropy across prediction heads with configurable weighting for coordinate tokens (tokens >= class_token_start_idx). When num_heads > 1, it implements per-head label shifting for multi-task output predictions.
Parameters:
Weight multiplier for coordinate tokens. Tokens with label IDs >= class_token_start_idx will have their loss multiplied by this factor. Default: 10.0
Token index threshold for coordinate tokens. Tokens with label IDs >= this value are considered coordinate/class tokens and receive higher loss weight. Default: 50000
Number of prediction heads (main + extra). Must match the model’s num_extra_heads + 1. Default: 1
Label value to ignore in loss computation. Default: -100
Loss reduction strategy (“sum” or “mean”). Default: “sum”
Cast logits to fp32 for numerical stability. Default: True
Compute loss with coordinate token weighting.
Parameters:
Model logits with shape [batch_size, seq_len, vocab_size]
Ground truth labels with shape [batch_size, seq_len]
Decoder input embeddings. Currently unused but kept for API compatibility. Default: None
Total number of valid tokens for normalization across gradient accumulation steps. If provided, loss is normalized by this value instead of the actual token count. Only supported with reduction=“sum”. Default: None
Returns: torch.Tensor
torch.Tensor: Computed loss value as a scalar tensor.