> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.loss.loss

Typed loss configs + builder (TorchTitan-style).

Each loss config is a plain dataclass exposing its full parameter surface as
named fields (no opaque `**kwargs`) and a `build()` method that constructs
the loss module directly — lazy imports keep optional kernel deps out of module
load.  Reading the dataclass tells you exactly what you can configure.

:func:`build_loss_config` normalizes any of these into a :class:`LossConfig`,
and :func:`build_loss_module` is the single build entry point — a thin wrapper that
returns `build_loss_config(...).build()`.  Both dispatch on the argument:

* a typed :class:`LossConfig` instance — the Automodel-native path; per-loss
  construction delegates to `config.build()`.
* a registry name or a known loss *class* (see :data:`LOSS_CONFIG_REGISTRY`,
  e.g. `MaskedCrossEntropy` → :class:`MaskedCrossEntropyConfig`) — upgraded to
  its typed config so the YAML recipe path gets the same field validation.
* any other loss *class* / *callable* plus arbitrary `**loss_kwargs` — the
  escape hatch (wrapped in :class:`LossFromFactoryConfig`) for external
  integrations (e.g. veRL).  Adding a new typed config never forces the caller
  to change.

## Module Contents

### Classes

| Name                                                                                        | Description                                                                  |
| ------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------- |
| [`FusedLinearCEConfig`](#nemo_automodel-components-loss-loss-FusedLinearCEConfig)           | `FusedLinearCrossEntropy`.                                                   |
| [`KDLossConfig`](#nemo_automodel-components-loss-loss-KDLossConfig)                         | `KDLoss` (knowledge distillation).                                           |
| [`LossConfig`](#nemo_automodel-components-loss-loss-LossConfig)                             | Base loss config.  Subclasses expose their full field surface and            |
| [`LossFromFactoryConfig`](#nemo_automodel-components-loss-loss-LossFromFactoryConfig)       | Escape hatch for external integrations (e.g. veRL) and the YAML recipe path. |
| [`MaskedCrossEntropyConfig`](#nemo_automodel-components-loss-loss-MaskedCrossEntropyConfig) | `MaskedCrossEntropy`.                                                        |
| [`TEParallelCEConfig`](#nemo_automodel-components-loss-loss-TEParallelCEConfig)             | `TEParallelCrossEntropy`.                                                    |

### Functions

| Name                                                                          | Description                                                              |
| ----------------------------------------------------------------------------- | ------------------------------------------------------------------------ |
| [`build_loss_config`](#nemo_automodel-components-loss-loss-build_loss_config) | Normalize a loss `loss` target plus `kwargs` into a :class:`LossConfig`. |
| [`build_loss_module`](#nemo_automodel-components-loss-loss-build_loss_module) | Build a loss function.                                                   |

### Data

[`LOSS_CONFIG_REGISTRY`](#nemo_automodel-components-loss-loss-LOSS_CONFIG_REGISTRY)

[`__all__`](#nemo_automodel-components-loss-loss-__all__)

### API

```python
class nemo_automodel.components.loss.loss.FusedLinearCEConfig(
    ignore_index: int = -100,
    logit_softcapping: float = 0.0,
    reduction: str = 'sum'
)
```

Dataclass

**Bases:** [LossConfig](#nemo_automodel-components-loss-loss-LossConfig)

`FusedLinearCrossEntropy`.

```python
nemo_automodel.components.loss.loss.FusedLinearCEConfig.build() -> torch.nn.Module
```

```python
class nemo_automodel.components.loss.loss.KDLossConfig(
    ignore_index: int = -100,
    temperature: float = 1.0,
    fp32_upcast: bool = True,
    tp_group: typing.Any = None,
    chunk_size: int = 0
)
```

Dataclass

**Bases:** [LossConfig](#nemo_automodel-components-loss-loss-LossConfig)

`KDLoss` (knowledge distillation).

```python
nemo_automodel.components.loss.loss.KDLossConfig.build() -> torch.nn.Module
```

```python
class nemo_automodel.components.loss.loss.LossConfig()
```

Dataclass

Base loss config.  Subclasses expose their full field surface and
implement :meth:`build`.

```python
nemo_automodel.components.loss.loss.LossConfig.build() -> torch.nn.Module
```

Construct the loss module.

```python
class nemo_automodel.components.loss.loss.LossFromFactoryConfig(
    factory: collections.abc.Callable[..., torch.nn.Module] | None = None,
    kwargs: dict[str, typing.Any] = dict()
)
```

Dataclass

**Bases:** [LossConfig](#nemo_automodel-components-loss-loss-LossConfig)

Escape hatch for external integrations (e.g. veRL) and the YAML recipe path.

Rather than exposing typed fields, it wraps a loss class/callable and the
`**kwargs` to construct it, keeping the factory path on the same
`config.build()` contract as the typed configs so callers never have to
special-case it.  The factory is called as `factory(**kwargs)`.

```python
nemo_automodel.components.loss.loss.LossFromFactoryConfig.build() -> torch.nn.Module
```

```python
class nemo_automodel.components.loss.loss.MaskedCrossEntropyConfig(
    fp32_upcast: bool = True,
    ignore_index: int = -100,
    reduction: str = 'sum'
)
```

Dataclass

**Bases:** [LossConfig](#nemo_automodel-components-loss-loss-LossConfig)

`MaskedCrossEntropy`.

```python
nemo_automodel.components.loss.loss.MaskedCrossEntropyConfig.build() -> torch.nn.Module
```

```python
class nemo_automodel.components.loss.loss.TEParallelCEConfig(
    ignore_index: int = -100,
    reduction: str = 'sum',
    tp_group: typing.Any = None
)
```

Dataclass

**Bases:** [LossConfig](#nemo_automodel-components-loss-loss-LossConfig)

`TEParallelCrossEntropy`.

```python
nemo_automodel.components.loss.loss.TEParallelCEConfig.build() -> torch.nn.Module
```

```python
nemo_automodel.components.loss.loss.build_loss_config(
    loss: nemo_automodel.components.loss.loss.LossConfig | str | collections.abc.Callable[..., torch.nn.Module],
    loss_kwargs: typing.Any = {}
) -> nemo_automodel.components.loss.loss.LossConfig
```

Normalize a loss `loss` target plus `kwargs` into a :class:`LossConfig`.

The single normalization entry point shared by the recipe layer (which
resolves a YAML `_target_` to a Python object) and :func:`build_loss_module`.
It dispatches on `loss`:

* a typed :class:`LossConfig` instance — returned as-is; `**loss_kwargs`
  must be empty (hyperparameters live on the config).
* a :class:`LossConfig` subclass (the class) — rejected; pass an instance.
* a registry name (see :data:`LOSS_CONFIG_REGISTRY`, e.g.
  `"MaskedCrossEntropy"`) — built from the matching typed config.
* a loss class / callable plus `**loss_kwargs` — if it is a registered loss
  class and the kwargs fit the config's fields, it is upgraded to that typed
  config; otherwise it is wrapped in a :class:`LossFromFactoryConfig`.  The
  caller resolves any dotted path to a callable; the component never does
  dotted-path resolution.

**Returns:** `LossConfig`

class:`LossConfig` ready to `build()`.

```python
nemo_automodel.components.loss.loss.build_loss_module(
    loss: nemo_automodel.components.loss.loss.LossConfig | collections.abc.Callable[..., torch.nn.Module],
    loss_kwargs: typing.Any = {}
) -> torch.nn.Module
```

Build a loss function.

Thin dispatcher: it normalizes `loss` to a :class:`LossConfig` via
:func:`build_loss_config` and returns `config.build()`.  Dispatches on
`loss`:

* **Typed config** (:class:`LossConfig` instance) — the Automodel-native
  path.  Hyperparameters come from the config; `**loss_kwargs` must be
  empty.
* **Loss class / callable** (e.g. `MaskedCrossEntropy`) plus
  `**loss_kwargs` — the integration / YAML escape hatch.  The caller
  resolves any dotted path to a callable; the component never does string
  resolution.

**Parameters:**

Typed :class:`LossConfig` instance, or a loss class/callable to
construct with `**loss_kwargs`.

Constructor kwargs for the class/callable form.  Must be
empty when `loss` is a typed config.

**Returns:** `nn.Module`

Instantiated loss function.

```python
nemo_automodel.components.loss.loss.LOSS_CONFIG_REGISTRY: dict[str, type[LossConfig]] = {'MaskedCrossEntropy': MaskedCrossEntropyConfig, 'FusedLinearCrossEntropy': Fuse...
```

```python
nemo_automodel.components.loss.loss.__all__ = ['LOSS_CONFIG_REGISTRY', 'FusedLinearCEConfig', 'KDLossConfig', 'LossConfig', 'L...
```