nemo_automodel.components.models.llama_bidirectional.export_onnx#

Export a HuggingFace biencoder / embedding checkpoint to ONNX.

The resulting ONNX graph maps: (input_ids, attention_mask) -> embeddings [batch, hidden_dim]

The export wraps the bare transformer with average-pooling and L2 normalisation so that the ONNX model produces ready-to-use embeddings.

Usage (standalone): python -m nemo_automodel.components.models.biencoder.export_onnx –model-path /path/to/hf_checkpoint –output-dir /path/to/onnx_output [–pooling avg] [–normalize] [–opset 17] [–dtype fp32]

Usage (from Python): from nemo_automodel.components.models.llama_bidirectional.export_onnx import export_to_onnx onnx_path = export_to_onnx(“/path/to/hf_checkpoint”, “/path/to/onnx_output”)

Module Contents#

Classes#

_Pooling

Pooling layer that reduces [batch, seq, hidden] -> [batch, hidden].

EmbeddingModelForExport

Wraps a base transformer with pooling + optional L2 normalisation.

Functions#

export_to_onnx

Export a HuggingFace embedding model to ONNX.

verify_onnx

Run a quick onnxruntime sanity check on the exported model.

_parse_args

main

Data#

API#

nemo_automodel.components.models.llama_bidirectional.export_onnx.logger#

‘getLogger(
)’

class nemo_automodel.components.models.llama_bidirectional.export_onnx._Pooling(pool_type: str = 'avg')#

Bases: torch.nn.Module

Pooling layer that reduces [batch, seq, hidden] -> [batch, hidden].

Initialization

forward(
last_hidden_states: torch.Tensor,
attention_mask: torch.Tensor,
) torch.Tensor#
class nemo_automodel.components.models.llama_bidirectional.export_onnx.EmbeddingModelForExport(
base_model: torch.nn.Module,
pooling: nemo_automodel.components.models.llama_bidirectional.export_onnx._Pooling,
normalize: bool = True,
)#

Bases: torch.nn.Module

Wraps a base transformer with pooling + optional L2 normalisation.

The forward signature is (input_ids, attention_mask) -> embeddings which is the contract expected by downstream ONNX / TensorRT consumers.

Initialization

forward(
input_ids: torch.Tensor,
attention_mask: torch.Tensor,
) torch.Tensor#
nemo_automodel.components.models.llama_bidirectional.export_onnx.export_to_onnx(
model_path: str,
output_dir: str,
*,
tokenizer_path: str | None = None,
pooling: str = 'avg',
normalize: bool = True,
opset: int = 17,
export_dtype: str = 'fp32',
verify: bool = True,
) str#

Export a HuggingFace embedding model to ONNX.

Parameters:
  • model_path – Path to the HuggingFace model directory (must contain config.json and weight files).

  • output_dir – Directory where model.onnx and tokenizer/ will be written.

  • tokenizer_path – Path to load the tokenizer from. Defaults to model_path when not specified. Useful when the checkpoint directory does not contain tokenizer files.

  • pooling – Pooling strategy applied on top of transformer hidden states. One of "avg", "cls", "last".

  • normalize – If True, L2-normalise the pooled embeddings.

  • opset – ONNX opset version (default 17).

  • export_dtype – Export precision — "fp32" or "fp16".

  • verify – Run a quick onnxruntime round-trip after export.

Returns:

Absolute path to the exported model.onnx.

nemo_automodel.components.models.llama_bidirectional.export_onnx.verify_onnx(onnx_path: str, tokenizer) None#

Run a quick onnxruntime sanity check on the exported model.

nemo_automodel.components.models.llama_bidirectional.export_onnx._parse_args() argparse.Namespace#
nemo_automodel.components.models.llama_bidirectional.export_onnx.main()#