`nemo_automodel.components.models.llama_bidirectional.export_onnx`#

Export a HuggingFace encoder / embedding checkpoint to ONNX.

The resulting ONNX graph maps: (input_ids, attention_mask) -> embeddings [batch, hidden_dim]

The export wraps the bare transformer with average-pooling and L2 normalisation so that the ONNX model produces ready-to-use embeddings.

Usage (standalone): python -m nemo_automodel.components.models.llama_bidirectional.export_onnx –model-path /path/to/hf_checkpoint –output-dir /path/to/onnx_output [–pooling avg] [–normalize] [–opset 17] [–dtype fp32]

Usage (from Python): from nemo_automodel.components.models.llama_bidirectional.export_onnx import export_to_onnx onnx_path = export_to_onnx(“/path/to/hf_checkpoint”, “/path/to/onnx_output”)

Module Contents#

Classes#

`_Pooling`	Pooling layer that reduces [batch, seq, hidden] -> [batch, hidden].
`EmbeddingModelForExport`	Wraps a base transformer with pooling + optional L2 normalisation.

Functions#

`export_to_onnx`	Export a HuggingFace embedding model to ONNX.
`verify_onnx`	Run a quick onnxruntime sanity check on the exported model.
`_parse_args`
`main`

Data#

logger

API#

nemo_automodel.components.models.llama_bidirectional.export_onnx.logger#: ‘getLogger(…)’

class nemo_automodel.components.models.llama_bidirectional.export_onnx._Pooling(pool_type: str = 'avg')#

Bases: torch.nn.Module

Pooling layer that reduces [batch, seq, hidden] -> [batch, hidden].

Initialization

forward( last_hidden_states: torch.Tensor, attention_mask: torch.Tensor, ) → torch.Tensor#

class nemo_automodel.components.models.llama_bidirectional.export_onnx.EmbeddingModelForExport( base_model: torch.nn.Module, pooling: nemo_automodel.components.models.llama_bidirectional.export_onnx._Pooling, normalize: bool = True, )#

Bases: torch.nn.Module

Wraps a base transformer with pooling + optional L2 normalisation.

The forward signature is (input_ids, attention_mask) -> embeddings which is the contract expected by downstream ONNX / TensorRT consumers.

Initialization

forward( input_ids: torch.Tensor, attention_mask: torch.Tensor, ) → torch.Tensor#

nemo_automodel.components.models.llama_bidirectional.export_onnx.export_to_onnx( model_path: str, output_dir: str, *, tokenizer_path: str | None = None, pooling: str = 'avg', normalize: bool = True, opset: int = 17, export_dtype: str = 'fp32', verify: bool = True, ) → str#

Export a HuggingFace embedding model to ONNX.

Parameters:

model_path – Path to the HuggingFace model directory (must contain config.json and weight files).
output_dir – Directory where model.onnx and tokenizer/ will be written.
tokenizer_path – Path to load the tokenizer from. Defaults to model_path when not specified. Useful when the checkpoint directory does not contain tokenizer files.
pooling – Pooling strategy applied on top of transformer hidden states. One of "avg", "cls", "last".
normalize – If True, L2-normalise the pooled embeddings.
opset – ONNX opset version (default 17).
export_dtype – Export precision — "fp32", "fp16", or "bf16".
verify – Run a quick onnxruntime round-trip after export.

Returns:

Absolute path to the exported model.onnx.

nemo_automodel.components.models.llama_bidirectional.export_onnx.verify_onnx(onnx_path: str, tokenizer) → None#: Run a quick onnxruntime sanity check on the exported model.

nemo_automodel.components.models.llama_bidirectional.export_onnx._parse_args() → argparse.Namespace#

nemo_automodel.components.models.llama_bidirectional.export_onnx.main()#

nemo_automodel.components.models.llama_bidirectional.export_onnx#