Example: Quick Start
Downloading a Pre-built TensorRT Engine
This section shows an example of how to download a TensorRT engine using the TensorRT-Cloud CLI.
Check which engines are available.
trt-cloud catalog engines
Choose an engine model and version combination you would like and download.
trt-cloud catalog download --model=gemma_2b_it_trtllm --version=bs1_int4awq_RTX4090_windows
Building a TensorRT Engine for ONNX Model
This section shows an example of how to build a TensorRT engine from an ONNX model using the TensorRT-Cloud CLI.
Prerequisites
Ensure you can log into TensorRT-Cloud.
Steps
Download an ONNX model, for example, MobileNetV2.
wget https://github.com/onnx/models/raw/main/Computer_Vision/mobilenetv2_050_Opset18_timm/mobilenetv2_050_Opset18.onnx
Build the engine with TensorRT-Cloud.
trt-cloud build --onnx mobilenetv2_050_Opset18.onnx --gpu RTX4090 --os windows --trtexec-args="--fp16"
Unzip the downloaded file. It will contain the TensorRT engine saved as
engine.trt
.engine.trt
can now be deployed using TensorRT 10.0 to run accelerated inference of MobileNetV2 on a RTX 4090 GPU on Windows.View the engine metrics in
metrics.json
. Metrics are extracted from TensorRT build logs. Below are the first few lines ofmetrics.json
generated for an example model.{ "throughput_qps": 782.646, "inference_gpu_memory_MiB": 5.0, "latency_ms": { "min": 0.911316, "max": 5.17419, "mean": 1.00846, "median": 0.97345, …