ONNX Conversion Guide#
TensorRT-RTX uses the Open Neural Network Exchange (ONNX) format as its primary model input. Before you can build a TensorRT-RTX engine, your model must be exported to an .onnx file. This guide covers how to export models from common training frameworks and how to build ONNX models programmatically.
Note
TensorRT-RTX supports ONNX opsets 9–22 (inclusive). Not all operators and precisions within these opsets are supported. For a complete list of supported operators, refer to the Operator Support reference.
Exporting from a Training Framework#
Most users train models in PyTorch, TensorFlow, or Hugging Face and export to ONNX. Choose the method that matches your framework.
PyTorch
Use torch.onnx.export() to convert a PyTorch model to ONNX:
import torch
model = ... # Your trained PyTorch model
dummy_input = torch.randn(1, 3, 224, 224)
torch.onnx.export(model, dummy_input, "model.onnx",
input_names=["input"],
output_names=["output"],
dynamic_axes={"input": {0: "batch_size"},
"output": {0: "batch_size"}})
For detailed options and troubleshooting, refer to the PyTorch ONNX export documentation and tutorial.
TensorFlow
TensorFlow does not include built-in ONNX support. Use the open-source tf2onnx tool:
pip install tf2onnx
python -m tf2onnx.convert --saved-model ./saved_model --output model.onnx
Hugging Face Transformers
Use the Optimum library to export Hugging Face model checkpoints to ONNX:
pip install optimum[onnxruntime]
optimum-cli export onnx --model bert-base-uncased ./bert_onnx/
Building ONNX Models Programmatically#
If your framework does not support ONNX export, you can construct ONNX models directly using the ONNX Python API. This approach defines the model graph, operators, and weights using protocol buffers.
Example: Simple multilayer perceptron
import onnx
import onnx.helper as helper
import onnx.numpy_helper as numpy_helper
from onnx import TensorProto
nb_inputs, nb_hidden, nb_outputs = 10, 20, 1
# Create input and output tensors
input = helper.make_tensor_value_info('input', TensorProto.FLOAT, ['batch', nb_inputs])
output = helper.make_tensor_value_info('output', TensorProto.FLOAT, ['batch', nb_outputs])
# Assume weights and biases are numpy tensors
W1, B1, W2, B2 = get_weights_biases(nb_inputs, nb_hidden, nb_outputs)
init_W1 = numpy_helper.from_array(W1, name='W1')
init_B1 = numpy_helper.from_array(B1, name='B1')
init_W2 = numpy_helper.from_array(W2, name='W2')
init_B2 = numpy_helper.from_array(B2, name='B2')
# Define operator nodes
gemm1 = helper.make_node('Gemm', ['input','W1','B1'], ['hidden1'], alpha=1.0, beta=1.0, transB=1)
relu1 = helper.make_node('Relu', ['hidden1'], ['relu1'])
gemm2 = helper.make_node('Gemm', ['relu1','W2','B2'], ['output'], alpha=1.0, beta=1.0, transB=1)
# Assemble graph
graph = helper.make_graph(nodes=[gemm1, relu1, gemm2],
name='MLP',
inputs=[input],
outputs=[output],
initializer=[init_W1, init_B1, init_W2, init_B2])
model = helper.make_model(graph, producer_name='simple_mlp')
# Validate and save
onnx.checker.check_model(model)
onnx.save(model, 'simple_mlp.onnx')
Next Steps#
After you have an .onnx file, proceed to build and run a TensorRT-RTX engine:
Deploy Your First Model — End-to-end walkthrough from ONNX to inference
Architecture Overview: Model Specification — ONNX vs. native API paths