Example: Quick Start#

Downloading a Pre-built TensorRT Engine#

This section shows an example of downloading a TensorRT engine using the TensorRT-Cloud CLI.

Check which engines are available.
```
trt-cloud catalog engines
```

Choose an engine model and version combination you would like and download.

trt-cloud catalog download --model=gemma_2b_it_trtllm --version=bs1_int4awq_RTX4090_windows

Building a TensorRT Engine for the ONNX Model#

This section shows an example of how to build a TensorRT engine from an ONNX model using the TensorRT-Cloud CLI.

Prerequisites

Ensure you can log into TensorRT-Cloud.

Steps

Download an ONNX model, such as MobileNetV2.

wget https://github.com/onnx/models/raw/main/Computer_Vision/mobilenetv2_050_Opset18_timm/mobilenetv2_050_Opset18.onnx

Build the engine with TensorRT-Cloud.

trt-cloud build onnx --model mobilenetv2_050_Opset18.onnx --gpu RTX4090 --os windows --trtexec-args="--fp16"

Unzip the downloaded file. The TensorRT engine is saved as engine.trt. engine.trt can now be deployed using TensorRT 10.0 to run accelerated inference of MobileNetV2 on an RTX 4090 GPU on Windows.

View the engine metrics in metrics.json. Metrics are extracted from TensorRT build logs. Below are the first few lines of metrics.json generated for an example model.

{
    "throughput_qps": 782.646,
    "inference_gpu_memory_MiB": 5.0,
    "latency_ms": {
        "min": 0.911316,
        "max": 5.17419,
        "mean": 1.00846,
        "median": 0.97345,
…

Building a TensorRT-LLM Engine#

Prerequisites

Ensure you can log into TensorRT-Cloud.

Steps

Pick a Hugging Face repository to build an engine for. For example, google/gemma-2b-it at main (huggingface.co).

Build the engine with TensorRT-Cloud.

trt-cloud build llm --hf-repo="google/gemma-2b-it" --dtype="bfloat16" --gpu RTX4090  --os windows

Unzip the downloaded file. The TensorRT engine is saved in build_result/engine. The engine can now be deployed.

View the engine metrics in build_result/metrics.json. For example:

{
    "rouge1": 30.532889598883763,
    "rouge2": 10.519224834860456,
    "rougeL": 22.77946327498464,
    "rougeLsum": 25.958965209634254,
    "generation_tokens_per_second": 126.849,
    "gpu_peak_mem_gb": 7.783
}