> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.models.common.inbatch_neg_utils

Distributed in-batch negative utilities for bi-encoder contrastive training.

Architecture-agnostic helpers used by the bi-encoder trainer to expand the
negative pool with passages gathered across DP ranks. Backbones (Llama,
Ministral3, Qwen3, ...) do not import these directly; the trainer wires them
in around `BiEncoderModel.encode`.

## Module Contents

### Functions

| Name                                                                                                                                                    | Description                                                            |
| ------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------- |
| [`_all_gather_tensor`](#nemo_automodel-components-models-common-inbatch_neg_utils-_all_gather_tensor)                                                   | All-gather `t` along dim 0, preserving autograd only when needed.      |
| [`dist_gather_tensor`](#nemo_automodel-components-models-common-inbatch_neg_utils-dist_gather_tensor)                                                   | All-gather `t` along dim 0 across the default process group.           |
| [`dist_gather_tensor_with_dim1_padding`](#nemo_automodel-components-models-common-inbatch_neg_utils-dist_gather_tensor_with_dim1_padding)               | All-gather `t` after padding dim 1 to the maximum length across ranks. |
| [`mask_gathered_passages_same_doc_as_positive`](#nemo_automodel-components-models-common-inbatch_neg_utils-mask_gathered_passages_same_doc_as_positive) | In-place mask passages sharing a doc id with this row's positive.      |

### API

```python
nemo_automodel.components.models.common.inbatch_neg_utils._all_gather_tensor(
    t: torch.Tensor,
    preserve_grad: bool = False
) -> torch.Tensor
```

All-gather `t` along dim 0, preserving autograd only when needed.

```python
nemo_automodel.components.models.common.inbatch_neg_utils.dist_gather_tensor(
    t: typing.Optional[torch.Tensor],
    preserve_grad: bool = False
) -> typing.Optional[torch.Tensor]
```

All-gather `t` along dim 0 across the default process group.

When `preserve_grad` is true, tensors that require gradients use an
autograd-aware gather so distributed in-batch-negative losses can send
passage gradients back to the owning rank. Otherwise, remote slices are
detached and only the local slice keeps gradient flow. Non-gradient tensors,
such as masks or IDs, always use a regular detached gather.
Returns `t` unchanged when distributed is not available, not initialized,
or world size is 1.

```python
nemo_automodel.components.models.common.inbatch_neg_utils.dist_gather_tensor_with_dim1_padding(
    t: typing.Optional[torch.Tensor],
    padding_value: int | float | bool = 0,
    preserve_grad: bool = False
) -> typing.Optional[torch.Tensor]
```

All-gather `t` after padding dim 1 to the maximum length across ranks.

```python
nemo_automodel.components.models.common.inbatch_neg_utils.mask_gathered_passages_same_doc_as_positive(
    scores: torch.Tensor,
    passage_doc_ids: torch.Tensor,
    train_n_passages: int,
    rank: int,
    local_batch_size: int
) -> None
```

In-place mask passages sharing a doc id with this row's positive.

After all-gather, each query's positive sits at column `i * train_n_passages`
of the gathered passage tensor. For each local query row, set scores to
`finfo(dtype).min` on any other column whose `passage_doc_ids` matches
the positive's id, so duplicates of the positive elsewhere in the global
batch are not treated as negatives. The true positive column is left intact.

**Parameters:**

`[local_batch_size, B_global * train_n_passages]` (already
sliced to the local rank's query rows).

`[B_global * train_n_passages]` int64 doc ids for
every gathered passage.

Number of passages per query (1 positive + negatives).

Caller's DP rank.

Number of queries per rank.