Example: Quick Start

Downloading a Pre-built TensorRT Engine

This section shows an example of how to download a TensorRT engine using the TensorRT-Cloud CLI.

  1. Check which engines are available.

    trt-cloud catalog engines
    
  2. Choose an engine model and version combination you would like and download.

    trt-cloud catalog download --model=gemma_2b_it_trtllm --version=bs1_int4awq_RTX4090_windows
    

Building a TensorRT Engine for ONNX Model

This section shows an example of how to build a TensorRT engine from an ONNX model using the TensorRT-Cloud CLI.

Prerequisites

  1. Ensure you can log into TensorRT-Cloud.

Steps

  1. Download an ONNX model, for example, MobileNetV2.

    wget https://github.com/onnx/models/raw/main/Computer_Vision/mobilenetv2_050_Opset18_timm/mobilenetv2_050_Opset18.onnx
    
  2. Build the engine with TensorRT-Cloud.

    trt-cloud build --onnx mobilenetv2_050_Opset18.onnx --gpu RTX4090 --os windows --trtexec-args="--fp16"
    
  3. Unzip the downloaded file. It will contain the TensorRT engine saved as engine.trt. engine.trt can now be deployed using TensorRT 10.0 to run accelerated inference of MobileNetV2 on a RTX 4090 GPU on Windows.

  4. View the engine metrics in metrics.json. Metrics are extracted from TensorRT build logs. Below are the first few lines of metrics.json generated for an example model.

    {
        "throughput_qps": 782.646,
        "inference_gpu_memory_MiB": 5.0,
        "latency_ms": {
            "min": 0.911316,
            "max": 5.17419,
            "mean": 1.00846,
            "median": 0.97345,
    …