Segformer with TAO Deploy#

To generate an optimized TensorRT engine for Segformer, the gen_trt_engine action takes an ONNX file previously produced by the Segformer export action. INT8 precision is not supported for Segformer.

Converting .onnx File into TensorRT Engine#

You can reuse the spec from the Segformer export action as a starting point.

trt_config#

The gen_trt_engine parameter defines TensorRT engine generation.

gen_trt_engine:
  onnx_file: /path/to/onnx_file
  trt_engine: /path/to/trt_engine
  input_width: 512
  input_height: 512
  tensorrt:
    data_type: FP32
    workspace_size: 1024
    min_batch_size: 1
    opt_batch_size: 1
    max_batch_size: 1

Parameter

Datatype

Default

Description

Supported Values

onnx_file

string

The precision to be used for the TensorRT engine

trt_engine

string

The maximum workspace size for the TensorRT engine

input_channel

unsigned int

3

The input channel size. Only the value 3 is supported.

3

input_width

unsigned int

960

The input width

>0

input_height

unsigned int

544

The input height

>0

batch_size

unsigned int

-1

The batch size of the ONNX model

>=-1

tensorrt#

The tensorrt parameter defines TensorRT engine generation.

Parameter

Datatype

Default

Description

Supported Values

data_type

string

fp32

The precision to be used for the TensorRT engine

fp32/fp16

workspace_size

unsigned int

1024

The maximum workspace size for the TensorRT engine

>1024

min_batch_size

unsigned int

1

The minimum batch size used for the optimization profile shape

>0

opt_batch_size

unsigned int

1

The optimal batch size used for the optimization profile shape

>0

max_batch_size

unsigned int

1

The maximum batch size used for the optimization profile shape

>0

Ask the agent to run the gen_trt_engine action against your spec. For example:

Build an FP16 TensorRT engine for Segformer from the exported ONNX at
``s3://my-bucket/segformer/model.onnx`` using ``trt-spec.yaml``. Write
the engine to ``s3://my-bucket/segformer/model.engine``. Run on the local Docker daemon.

Running Evaluation through TensorRT Engine#

Same specification file as TAO evaluation/ inference specification file. Sample specification file:

model:
  input_height: 512
  input_width: 512
  backbone:
    type: "mit_b1"
dataset:
  img_norm_cfg:
    mean:
      - 127.5
      - 127.5
      - 127.5
    std:
      - 127.5
      - 127.5
      - 127.5
    test_dataset:
      img_dir: /data/images/val
      ann_dir: /data/masks/val
  input_type: "grayscale"
  data_root: /tlt-pytorch
  palette:
    - seg_class: foreground
      rgb:
        - 0
        - 0
        - 0
      label_id: 0
      mapping_class: foreground
    - seg_class: background
      rgb:
        - 255
        - 255
        - 255
      label_id: 1
      mapping_class: background
  batch_size: 1
  workers_per_gpu: 1

Ask the agent to run the evaluate action against the engine you built. For example:

Evaluate the Segformer TensorRT engine at
``s3://my-bucket/segformer/model.engine`` against ``eval-spec.yaml``. Run
on local Docker.

Running Inference through TensorRT Engine#

Ask the agent to run the inference action against the engine you built. For example:

Run Segformer inference with the TensorRT engine at
``s3://my-bucket/segformer/model.engine`` using ``infer-spec.yaml``. Run
on your chosen backend.

Mask-overlaid visualizations are written to vis_overlay under the configured results directory, and raw mask predictions are written to mask_labels.