> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.models.llama_bidirectional.export_onnx

Export a HuggingFace encoder / embedding checkpoint to ONNX.

The export wraps the bare transformer with average-pooling and L2
normalisation so that the ONNX model produces ready-to-use embeddings.

Usage (standalone):
python -m nemo\_automodel.components.models.llama\_bidirectional.export\_onnx         --model-path /path/to/hf\_checkpoint         --output-dir /path/to/onnx\_output         \[--pooling avg] \[--normalize] \[--opset 17] \[--dtype fp32]

Usage (from Python):
from nemo\_automodel.components.models.llama\_bidirectional.export\_onnx import export\_to\_onnx
onnx\_path = export\_to\_onnx("/path/to/hf\_checkpoint", "/path/to/onnx\_output")

## Module Contents

### Classes

| Name                                                                                                                   | Description                                                           |
| ---------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------- |
| [`EmbeddingModelForExport`](#nemo_automodel-components-models-llama_bidirectional-export_onnx-EmbeddingModelForExport) | Wraps a base transformer with pooling + optional L2 normalisation.    |
| [`_Pooling`](#nemo_automodel-components-models-llama_bidirectional-export_onnx-_Pooling)                               | Pooling layer that reduces \[batch, seq, hidden] -> \[batch, hidden]. |

### Functions

| Name                                                                                                 | Description                                                 |
| ---------------------------------------------------------------------------------------------------- | ----------------------------------------------------------- |
| [`_parse_args`](#nemo_automodel-components-models-llama_bidirectional-export_onnx-_parse_args)       | -                                                           |
| [`export_to_onnx`](#nemo_automodel-components-models-llama_bidirectional-export_onnx-export_to_onnx) | Export a HuggingFace embedding model to ONNX.               |
| [`main`](#nemo_automodel-components-models-llama_bidirectional-export_onnx-main)                     | -                                                           |
| [`verify_onnx`](#nemo_automodel-components-models-llama_bidirectional-export_onnx-verify_onnx)       | Run a quick onnxruntime sanity check on the exported model. |

### Data

[`logger`](#nemo_automodel-components-models-llama_bidirectional-export_onnx-logger)

### API

```python
class nemo_automodel.components.models.llama_bidirectional.export_onnx.EmbeddingModelForExport(
    base_model: torch.nn.Module,
    pooling: nemo_automodel.components.models.llama_bidirectional.export_onnx._Pooling,
    normalize: bool = True
)
```

**Bases:** `Module`

Wraps a base transformer with pooling + optional L2 normalisation.

The `forward` signature is `(input_ids, attention_mask) -&gt; embeddings`
which is the contract expected by downstream ONNX / TensorRT consumers.

```python
nemo_automodel.components.models.llama_bidirectional.export_onnx.EmbeddingModelForExport.forward(
    input_ids: torch.Tensor,
    attention_mask: torch.Tensor
) -> torch.Tensor
```

```python
class nemo_automodel.components.models.llama_bidirectional.export_onnx._Pooling(
    pool_type: str = 'avg'
)
```

**Bases:** `Module`

Pooling layer that reduces \[batch, seq, hidden] -> \[batch, hidden].

```python
nemo_automodel.components.models.llama_bidirectional.export_onnx._Pooling.forward(
    last_hidden_states: torch.Tensor,
    attention_mask: torch.Tensor
) -> torch.Tensor
```

```python
nemo_automodel.components.models.llama_bidirectional.export_onnx._parse_args() -> argparse.Namespace
```

```python
nemo_automodel.components.models.llama_bidirectional.export_onnx.export_to_onnx(
    model_path: str,
    output_dir: str,
    tokenizer_path: str | None = None,
    pooling: str = 'avg',
    normalize: bool = True,
    opset: int = 17,
    export_dtype: str = 'fp32',
    verify: bool = True
) -> str
```

Export a HuggingFace embedding model to ONNX.

**Parameters:**

Path to the HuggingFace model directory (must contain
`config.json` and weight files).

Directory where `model.onnx` and `tokenizer/` will
be written.

Path to load the tokenizer from.  Defaults to
*model\_path* when not specified.  Useful when the
checkpoint directory does not contain tokenizer files.

Pooling strategy applied on top of transformer hidden
states.  One of `"avg"`, `"cls"`, `"last"`.

If *True*, L2-normalise the pooled embeddings.

ONNX opset version (default 17).

Export precision — `"fp32"`, `"fp16"`, or `"bf16"`.

Run a quick onnxruntime round-trip after export.

**Returns:** `str`

Absolute path to the exported `model.onnx`.

```python
nemo_automodel.components.models.llama_bidirectional.export_onnx.main()
```

```python
nemo_automodel.components.models.llama_bidirectional.export_onnx.verify_onnx(
    onnx_path: str,
    tokenizer
) -> None
```

Run a quick onnxruntime sanity check on the exported model.

```python
nemo_automodel.components.models.llama_bidirectional.export_onnx.logger = logging.getLogger(__name__)
```