nemo_automodel.components.models.llama_bidirectional.export_onnx

View as Markdown

Export a HuggingFace encoder / embedding checkpoint to ONNX.

The export wraps the bare transformer with average-pooling and L2 normalisation so that the ONNX model produces ready-to-use embeddings.

Usage (standalone): python -m nemo_automodel.components.models.llama_bidirectional.export_onnx —model-path /path/to/hf_checkpoint —output-dir /path/to/onnx_output [—pooling avg] [—normalize] [—opset 17] [—dtype fp32]

Usage (from Python): from nemo_automodel.components.models.llama_bidirectional.export_onnx import export_to_onnx onnx_path = export_to_onnx(“/path/to/hf_checkpoint”, “/path/to/onnx_output”)

Module Contents

Classes

NameDescription
EmbeddingModelForExportWraps a base transformer with pooling + optional L2 normalisation.
_PoolingPooling layer that reduces [batch, seq, hidden] -> [batch, hidden].

Functions

NameDescription
_parse_args-
export_to_onnxExport a HuggingFace embedding model to ONNX.
main-
verify_onnxRun a quick onnxruntime sanity check on the exported model.

Data

logger

API

class nemo_automodel.components.models.llama_bidirectional.export_onnx.EmbeddingModelForExport(
base_model: torch.nn.Module,
pooling: nemo_automodel.components.models.llama_bidirectional.export_onnx._Pooling,
normalize: bool = True
)

Bases: Module

Wraps a base transformer with pooling + optional L2 normalisation.

The forward signature is (input_ids, attention_mask) -> embeddings which is the contract expected by downstream ONNX / TensorRT consumers.

nemo_automodel.components.models.llama_bidirectional.export_onnx.EmbeddingModelForExport.forward(
input_ids: torch.Tensor,
attention_mask: torch.Tensor
) -> torch.Tensor
class nemo_automodel.components.models.llama_bidirectional.export_onnx._Pooling(
pool_type: str = 'avg'
)

Bases: Module

Pooling layer that reduces [batch, seq, hidden] -> [batch, hidden].

nemo_automodel.components.models.llama_bidirectional.export_onnx._Pooling.forward(
last_hidden_states: torch.Tensor,
attention_mask: torch.Tensor
) -> torch.Tensor
nemo_automodel.components.models.llama_bidirectional.export_onnx._parse_args() -> argparse.Namespace
nemo_automodel.components.models.llama_bidirectional.export_onnx.export_to_onnx(
model_path: str,
output_dir: str,
tokenizer_path: str | None = None,
pooling: str = 'avg',
normalize: bool = True,
opset: int = 17,
export_dtype: str = 'fp32',
verify: bool = True
) -> str

Export a HuggingFace embedding model to ONNX.

Parameters:

model_path
str

Path to the HuggingFace model directory (must contain config.json and weight files).

output_dir
str

Directory where model.onnx and tokenizer/ will be written.

tokenizer_path
str | NoneDefaults to None

Path to load the tokenizer from. Defaults to model_path when not specified. Useful when the checkpoint directory does not contain tokenizer files.

pooling
strDefaults to 'avg'

Pooling strategy applied on top of transformer hidden states. One of "avg", "cls", "last".

normalize
boolDefaults to True

If True, L2-normalise the pooled embeddings.

opset
intDefaults to 17

ONNX opset version (default 17).

export_dtype
strDefaults to 'fp32'

Export precision — "fp32", "fp16", or "bf16".

verify
boolDefaults to True

Run a quick onnxruntime round-trip after export.

Returns: str

Absolute path to the exported model.onnx.

nemo_automodel.components.models.llama_bidirectional.export_onnx.main()
nemo_automodel.components.models.llama_bidirectional.export_onnx.verify_onnx(
onnx_path: str,
tokenizer
) -> None

Run a quick onnxruntime sanity check on the exported model.

nemo_automodel.components.models.llama_bidirectional.export_onnx.logger = logging.getLogger(__name__)