ONNX Conversion and Deployment#

The Open Neural Network Exchange Format (ONNX) is an open standard for exchanging deep learning models. It is also the preferred data format that TensorRT-RTX uses to import model architectures. We discuss how ONNX model files can be generated from scratch, as well as exported from the most popular deep learning frameworks.

Note

The parser used by TensorRT-RTX will support ONNX opsets 9 to 22 (inclusive), with the limitation that not all operators and precisions may be supported.

Building ONNX Models from Scratch#

ONNX files can be defined programmatically via a Python API, which is the best choice for users who have trained their models using a framework that doesn’t offer built-in ONNX support. ONNX models consist of objects, which are represented via protocol buffers.

The following example code serializes a simple multilayer perceptron to an ONNX file.

import onnx
import onnx.helper as helper
import onnx.numpy_helper as numpy_helper
from onnx import TensorProto
nb_inputs, nb_hidden, nb_outputs = 10, 20, 1

# Create input and output tensors
input = helper.make_tensor_value_info('input', TensorProto.FLOAT, ['batch', nb_inputs])
output = helper.make_tensor_value_info('output', TensorProto.FLOAT, ['batch', nb_outputs])

# Assume weights and biases are numpy tensors
W1, B1, W2, B2 = get_weights_biases(nb_inputs, nb_hidden, nb_outputs)
init_W1 = numpy_helper.from_array(W1, name='W1')
init_B1 = numpy_helper.from_array(B1, name='B1')
init_W2 = numpy_helper.from_array(W2, name='W2')
init_B2 = numpy_helper.from_array(B2, name='B2')

# Define operator nodes
gemm1 = helper.make_node('Gemm', ['input','W1','B1'], ['hidden1'], alpha=1.0, beta=1.0, transB=1)
relu1 = helper.make_node('Relu', ['hidden1'], ['relu1'])
gemm2 = helper.make_node('Gemm', ['relu1','W2','B2'], ['output'], alpha=1.0, beta=1.0, transB=1)

# Assemble graph
graph = helper.make_graph(nodes=[gemm1, relu1, gemm2],
                        name='MLP',
                        inputs=[input],
                        outputs=[output],
                        initializer=[init_W1, init_B1, init_W2, init_B2])
model = helper.make_model(graph, producer_name='simple_mlp')

# Validate and save
onnx.checker.check_model(model)
onnx.save(model, 'simple_mlp.onnx')

Exporting ONNX Models from Frameworks#

Exporting ONNX models from PyTorch is well-documented within the API documentation and tutorials.

The TensorFlow framework does not offer built-in ONNX support, but export is still possible via the open-source tool tf2onnx.

Finally, the Hugging Face transformers library offers the Optimum tool to serialize Hugging Face model checkpoints into ONNX files.