Example: Quick Start

This section shows an example of how to build a TensorRT engine from an ONNX model using the TensorRT-Cloud CLI.

  1. Download an ONNX model, for example, MobileNetV2.

    wget https://github.com/onnx/models/raw/main/Computer_Vision/mobilenetv2_050_Opset18_timm/mobilenetv2_050_Opset18.onnx
    
  2. Build the engine with TensorRT-Cloud.

    trt-cloud build --onnx mobilenetv2_050_Opset18.onnx --gpu RTX4090 --os windows --trtexec-args="--fp16 --bf16"
    
  3. Unzip the downloaded file. It will contain the TensorRT engine saved as engine.trt. engine.trt can now be deployed using TensorRT 10.0 to run accelerated inference of MobileNetV2 on a RTX 4090 GPU on Windows.

  4. View the engine metrics in metrics.json. Metrics are extracted from TensorRT build logs. Below are the first few lines of metrics.json generated for an example model.

    {
        "throughput_qps": 782.646,
        "inference_gpu_memory_MiB": 5.0,
        "latency_ms": {
            "min": 0.911316,
            "max": 5.17419,
            "mean": 1.00846,
            "median": 0.97345,
    …