nemo_automodel.components.models.llama_bidirectional.export_onnx

Export a HuggingFace encoder / embedding checkpoint to ONNX.

The export wraps the bare transformer with average-pooling and L2 normalisation so that the ONNX model produces ready-to-use embeddings.

Usage (standalone): python -m nemo_automodel.components.models.llama_bidirectional.export_onnx —model-path /path/to/hf_checkpoint —output-dir /path/to/onnx_output [—pooling avg] [—normalize] [—opset 17] [—dtype fp32]

Usage (from Python): from nemo_automodel.components.models.llama_bidirectional.export_onnx import export_to_onnx onnx_path = export_to_onnx(“/path/to/hf_checkpoint”, “/path/to/onnx_output”)

Module Contents

Classes

Name	Description
`EmbeddingModelForExport`	Wraps a base transformer with pooling + optional L2 normalisation.
`_Pooling`	Pooling layer that reduces [batch, seq, hidden] -> [batch, hidden].

Functions

Name	Description
`_parse_args`	-
`export_to_onnx`	Export a HuggingFace embedding model to ONNX.
`main`	-
`verify_onnx`	Run a quick onnxruntime sanity check on the exported model.

Data

logger

API

class nemo_automodel.components.models.llama_bidirectional.export_onnx.EmbeddingModelForExport(
    base_model: torch.nn.Module,
    pooling: nemo_automodel.components.models.llama_bidirectional.export_onnx._Pooling,
    normalize: bool = True
)

Bases: Module

Wraps a base transformer with pooling + optional L2 normalisation.

The forward signature is (input_ids, attention_mask) -> embeddings which is the contract expected by downstream ONNX / TensorRT consumers.

nemo_automodel.components.models.llama_bidirectional.export_onnx.EmbeddingModelForExport.forward(
    input_ids: torch.Tensor,
    attention_mask: torch.Tensor
) -> torch.Tensor

class nemo_automodel.components.models.llama_bidirectional.export_onnx._Pooling(
    pool_type: str = 'avg'
)

Bases: Module

Pooling layer that reduces [batch, seq, hidden] -> [batch, hidden].

nemo_automodel.components.models.llama_bidirectional.export_onnx._Pooling.forward(
    last_hidden_states: torch.Tensor,
    attention_mask: torch.Tensor
) -> torch.Tensor

nemo_automodel.components.models.llama_bidirectional.export_onnx._parse_args() -> argparse.Namespace

nemo_automodel.components.models.llama_bidirectional.export_onnx.export_to_onnx(
    model_path: str,
    output_dir: str,
    tokenizer_path: str | None = None,
    pooling: str = 'avg',
    normalize: bool = True,
    opset: int = 17,
    export_dtype: str = 'fp32',
    verify: bool = True
) -> str

Export a HuggingFace embedding model to ONNX.

Parameters:

model_path

str

Path to the HuggingFace model directory (must contain config.json and weight files).

output_dir

str

Directory where model.onnx and tokenizer/ will be written.

tokenizer_path

str | NoneDefaults to None

Path to load the tokenizer from. Defaults to model_path when not specified. Useful when the checkpoint directory does not contain tokenizer files.

pooling

strDefaults to 'avg'

Pooling strategy applied on top of transformer hidden states. One of "avg", "cls", "last".

normalize

boolDefaults to True

If True, L2-normalise the pooled embeddings.

opset

intDefaults to 17

ONNX opset version (default 17).

export_dtype

strDefaults to 'fp32'

Export precision — "fp32", "fp16", or "bf16".

verify

boolDefaults to True

Run a quick onnxruntime round-trip after export.

Returns: str

Absolute path to the exported model.onnx.

nemo_automodel.components.models.llama_bidirectional.export_onnx.main()

nemo_automodel.components.models.llama_bidirectional.export_onnx.verify_onnx(
    onnx_path: str,
    tokenizer
) -> None

Run a quick onnxruntime sanity check on the exported model.

nemo_automodel.components.models.llama_bidirectional.export_onnx.logger = logging.getLogger(__name__)