nemo_automodel.components.models.llama_bidirectional.export_onnx#
Export a HuggingFace biencoder / embedding checkpoint to ONNX.
The resulting ONNX graph maps: (input_ids, attention_mask) -> embeddings [batch, hidden_dim]
The export wraps the bare transformer with average-pooling and L2 normalisation so that the ONNX model produces ready-to-use embeddings.
Usage (standalone): python -m nemo_automodel.components.models.biencoder.export_onnx âmodel-path /path/to/hf_checkpoint âoutput-dir /path/to/onnx_output [âpooling avg] [ânormalize] [âopset 17] [âdtype fp32]
Usage (from Python): from nemo_automodel.components.models.llama_bidirectional.export_onnx import export_to_onnx onnx_path = export_to_onnx(â/path/to/hf_checkpointâ, â/path/to/onnx_outputâ)
Module Contents#
Classes#
Pooling layer that reduces [batch, seq, hidden] -> [batch, hidden]. |
|
Wraps a base transformer with pooling + optional L2 normalisation. |
Functions#
Export a HuggingFace embedding model to ONNX. |
|
Run a quick onnxruntime sanity check on the exported model. |
|
Data#
API#
- nemo_automodel.components.models.llama_bidirectional.export_onnx.logger#
âgetLogger(âŠ)â
- class nemo_automodel.components.models.llama_bidirectional.export_onnx._Pooling(pool_type: str = 'avg')#
Bases:
torch.nn.ModulePooling layer that reduces [batch, seq, hidden] -> [batch, hidden].
Initialization
- forward(
- last_hidden_states: torch.Tensor,
- attention_mask: torch.Tensor,
- class nemo_automodel.components.models.llama_bidirectional.export_onnx.EmbeddingModelForExport(
- base_model: torch.nn.Module,
- pooling: nemo_automodel.components.models.llama_bidirectional.export_onnx._Pooling,
- normalize: bool = True,
Bases:
torch.nn.ModuleWraps a base transformer with pooling + optional L2 normalisation.
The
forwardsignature is(input_ids, attention_mask) -> embeddingswhich is the contract expected by downstream ONNX / TensorRT consumers.Initialization
- forward(
- input_ids: torch.Tensor,
- attention_mask: torch.Tensor,
- nemo_automodel.components.models.llama_bidirectional.export_onnx.export_to_onnx(
- model_path: str,
- output_dir: str,
- *,
- tokenizer_path: str | None = None,
- pooling: str = 'avg',
- normalize: bool = True,
- opset: int = 17,
- export_dtype: str = 'fp32',
- verify: bool = True,
Export a HuggingFace embedding model to ONNX.
- Parameters:
model_path â Path to the HuggingFace model directory (must contain
config.jsonand weight files).output_dir â Directory where
model.onnxandtokenizer/will be written.tokenizer_path â Path to load the tokenizer from. Defaults to model_path when not specified. Useful when the checkpoint directory does not contain tokenizer files.
pooling â Pooling strategy applied on top of transformer hidden states. One of
"avg","cls","last".normalize â If True, L2-normalise the pooled embeddings.
opset â ONNX opset version (default 17).
export_dtype â Export precision â
"fp32"or"fp16".verify â Run a quick onnxruntime round-trip after export.
- Returns:
Absolute path to the exported
model.onnx.
- nemo_automodel.components.models.llama_bidirectional.export_onnx.verify_onnx(onnx_path: str, tokenizer) None#
Run a quick onnxruntime sanity check on the exported model.
- nemo_automodel.components.models.llama_bidirectional.export_onnx._parse_args() argparse.Namespace#
- nemo_automodel.components.models.llama_bidirectional.export_onnx.main()#