nemo_automodel.components.models.llama_bidirectional.export_onnx
nemo_automodel.components.models.llama_bidirectional.export_onnx
Export a HuggingFace encoder / embedding checkpoint to ONNX.
The export wraps the bare transformer with average-pooling and L2 normalisation so that the ONNX model produces ready-to-use embeddings.
Usage (standalone): python -m nemo_automodel.components.models.llama_bidirectional.export_onnx —model-path /path/to/hf_checkpoint —output-dir /path/to/onnx_output [—pooling avg] [—normalize] [—opset 17] [—dtype fp32]
Usage (from Python): from nemo_automodel.components.models.llama_bidirectional.export_onnx import export_to_onnx onnx_path = export_to_onnx(“/path/to/hf_checkpoint”, “/path/to/onnx_output”)
Module Contents
Classes
Functions
Data
API
Bases: Module
Wraps a base transformer with pooling + optional L2 normalisation.
The forward signature is (input_ids, attention_mask) -> embeddings
which is the contract expected by downstream ONNX / TensorRT consumers.
Bases: Module
Pooling layer that reduces [batch, seq, hidden] -> [batch, hidden].
Export a HuggingFace embedding model to ONNX.
Parameters:
Path to the HuggingFace model directory (must contain
config.json and weight files).
Directory where model.onnx and tokenizer/ will
be written.
Path to load the tokenizer from. Defaults to model_path when not specified. Useful when the checkpoint directory does not contain tokenizer files.
Pooling strategy applied on top of transformer hidden
states. One of "avg", "cls", "last".
If True, L2-normalise the pooled embeddings.
ONNX opset version (default 17).
Export precision — "fp32", "fp16", or "bf16".
Run a quick onnxruntime round-trip after export.
Returns: str
Absolute path to the exported model.onnx.
Run a quick onnxruntime sanity check on the exported model.