> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.models.nemotron_parse.nemotron_parse_loss

## Module Contents

### Classes

| Name                                                                                                          | Description                                                           |
| ------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------- |
| [`NemotronParseLoss`](#nemo_automodel-components-models-nemotron_parse-nemotron_parse_loss-NemotronParseLoss) | Cross-entropy loss with coordinate token weighting for NemotronParse. |

### API

```python
class nemo_automodel.components.models.nemotron_parse.nemotron_parse_loss.NemotronParseLoss(
    coordinate_weight: float = 10.0,
    class_token_start_idx: int = 50000,
    num_heads: int = 1,
    ignore_index: int = -100,
    reduction: str = 'sum',
    fp32_upcast: bool = True
)
```

**Bases:** `Module`

Cross-entropy loss with coordinate token weighting for NemotronParse.

This loss function computes cross-entropy across prediction heads with configurable
weighting for coordinate tokens (tokens >= class\_token\_start\_idx). When num\_heads > 1,
it implements per-head label shifting for multi-task output predictions.

**Parameters:**

Weight multiplier for coordinate tokens. Tokens with
label IDs >= class\_token\_start\_idx will have their loss multiplied by this factor.
Default: 10.0

Token index threshold for coordinate tokens. Tokens
with label IDs >= this value are considered coordinate/class tokens and receive
higher loss weight. Default: 50000

Number of prediction heads (main + extra). Must match the model's
num\_extra\_heads + 1. Default: 1

Label value to ignore in loss computation. Default: -100

Loss reduction strategy ("sum" or "mean"). Default: "sum"

Cast logits to fp32 for numerical stability. Default: True

```python
nemo_automodel.components.models.nemotron_parse.nemotron_parse_loss.NemotronParseLoss.forward(
    logits: torch.Tensor,
    labels: torch.Tensor,
    decoder_inputs_embeds: typing.Optional[torch.Tensor] = None,
    num_label_tokens: typing.Optional[int] = None
) -> torch.Tensor
```

Compute loss with coordinate token weighting.

**Parameters:**

Model logits with shape \[batch\_size, seq\_len, vocab\_size]

Ground truth labels with shape \[batch\_size, seq\_len]

Decoder input embeddings.
Currently unused but kept for API compatibility. Default: None

Total number of valid tokens for normalization
across gradient accumulation steps. If provided, loss is normalized by this
value instead of the actual token count. Only supported with reduction="sum".
Default: None

**Returns:** `torch.Tensor`

torch.Tensor: Computed loss value as a scalar tensor.