EfficientDet (TF2) with TAO Deploy#
To generate an optimized TensorRT engine for TF2 EfficientDet, the gen_trt_engine action
takes an ONNX file previously produced by the TF2 EfficientDet export action. For more
information about training the TF2 EfficientDet, refer to the
TF2 EfficientDet training documentation.
Each task is explained in detail in the following sections.
Converting ONNX File into TensorRT Engine#
You can reuse the spec from the TF2 EfficientDet Exporting the model section as a starting point.
GenTrtEngine Config#
The gen_trt_engine configuration contains the parameters of exporting a .onnx model to TensorRT engine, which can be used for deployment.
Field |
Description |
Data Type and Constraints |
Recommended/Typical Value |
onnx_file |
The path to the exported .onnx model |
string |
|
trt_engine |
The path where the generated engine will be stored |
string |
|
results_dir |
Directory to save the output log. If not specified log will be saved under global $results_dir/gen_trt_engine |
string |
|
tensorrt |
TensorRT config |
Dict |
The tensorrt configuration contains specification of the TensorRT engine and calibration requirements.
+——————————+———————————————————————-+——————————-+——————————-+
| Field | Description | Data Type and Constraints | Recommended/Typical Value |
+——————————+———————————————————————-+——————————-+——————————-+
| data_type | The precision to be used for the TensorRT engine | string | FP32 |
+——————————+———————————————————————-+——————————-+——————————-+
| min_batch_size | The minimum batch size used for optimization profile shape | unsigned int | 1 |
+——————————+———————————————————————-+——————————-+——————————-+
| opt_batch_size | The optimal batch size used for optimization profile shape | unsigned int | 1 |
+——————————+———————————————————————-+——————————-+——————————-+
| max_batch_size | The maximum batch size used for optimization profile shape | unsigned int | 1 |
+——————————+———————————————————————-+——————————-+——————————-+
| max_workspace_size | The maximum workspace size for the TensorRT engine | unsigned int | 2 |
+——————————+———————————————————————-+——————————-+——————————-+
| calibration | Calibration config | Dict | |
+——————————+———————————————————————-+——————————-+——————————-+
The calibration configuration specifies the location of the calibration data and where to save the calibration cache file.
+——————————+———————————————————————-+——————————-+——————————-+
| Field | Description | Data Type and Constraints | Recommended/Typical Value |
+——————————+———————————————————————-+——————————-+——————————-+
| cal_image_dir | The directory containing images to be used for calibration | string | False |
+——————————+———————————————————————-+——————————-+——————————-+
| cal_cache_file | The path to calibration cache file | string | False |
+——————————+———————————————————————-+——————————-+——————————-+
| cal_batches | The number of batches to be iterated for calibration | unsigned int | 10 |
+——————————+———————————————————————-+——————————-+——————————-+
| cal_batch_size | The batch size for each batch | unsigned int | 1 |
+——————————+———————————————————————-+——————————-+——————————-+
The following is a sample spec for TF2 EfficientDet:
dataset:
augmentation:
rand_hflip: True
random_crop_min_scale: 0.1
random_crop_max_scale: 2
loader:
prefetch_size: 4
shuffle_file: False
shuffle_buffer: 10000
cycle_length: 32
block_length: 16
max_instances_per_image: 100
skip_crowd_during_training: True
num_classes: 91
train_tfrecords:
- '/data/train-*'
val_tfrecords:
- '/data/val-*'
val_json_file: '/data/annotations/instances_val2017.json'
train:
optimizer:
name: 'sgd'
momentum: 0.9
lr_schedule:
name: 'cosine'
warmup_epoch: 5
warmup_init: 0.0001
learning_rate: 0.2
amp: True
checkpoint: ''
num_examples_per_epoch: 100
moving_average_decay: 0.999
batch_size: 20
checkpoint_interval: 5
l2_weight_decay: 0.00004
l1_weight_decay: 0.0
clip_gradients_norm: 10.0
image_preview: True
qat: False
random_seed: 42
pruned_model_path: ''
num_epochs: 20
model:
name: 'efficientdet-d0'
input_width: 512
input_height: 512
aspect_ratios: '[(1.0, 1.0), (1.4, 0.7), (0.7, 1.4)]'
anchor_scale: 4
min_level: 3
max_level: 7
num_scales: 3
freeze_bn: False
freeze_blocks: []
evaluate:
batch_size: 8
num_samples: 500
max_detections_per_image: 100
label_map: "/data/coco_labels.yaml"
trt_engine: "/output/efficientdet-d0.fp32.engine"
checkpoint: '/weights/efficientdet-d0_100.tlt'
export:
batch_size: 1
dynamic_batch_size: True
min_score_thresh: 0.4
checkpoint: '/weights/efficientdet-d0_100.tlt'
onnx_file: "/output/efficientdet-d0.onnx"
gen_trt_engine:
onnx_file: "/output/efficientdet-d0.onnx"
trt_engine: "/output/efficientdet-d0.fp32.engine"
tensorrt:
data_type: "fp32"
max_workspace_size: 2 # in Gb
calibration:
cal_image_dir: "/data/raw-data/val2017"
cal_cache_file: "EXPORTDIR/efficientdet-d0.cal"
cal_batch_size: 16
cal_batches: 10
inference:
checkpoint: '/weights/efficientdet-d0_100.tlt'
trt_engine: "/output/efficientdet-d0.fp32.engine"
image_dir: "/data/test_samples"
dump_label: False
batch_size: 1
min_score_thresh: 0.4
label_map: "/data/coco_labels.yaml"
results_dir: '/results'
Ask the agent to run the gen_trt_engine action against your spec. For example:
Build an FP16 TensorRT engine for TF2 EfficientDet from the exported
ONNX at ``s3://my-bucket/effdet-tf2/efficientdet-d0.onnx`` using
``trt-spec.yaml``. Write the engine to
``s3://my-bucket/effdet-tf2/efficientdet-d0.engine``. Run on the local Docker daemon.
Running Evaluation through TensorRT Engine#
Use the same specification file as the TAO evaluation specification file.
Ask the agent to run the evaluate action against the engine you built. For example:
Evaluate the TF2 EfficientDet TensorRT engine at
``s3://my-bucket/effdet-tf2/efficientdet-d0.engine`` against
``eval-spec.yaml``. Run on local Docker.
Running Inference through TensorRT Engine#
Use the same specification file as the TAO inference specification file.
Ask the agent to run the inference action against the engine you built. For example:
Run TF2 EfficientDet inference with the TensorRT engine at
``s3://my-bucket/effdet-tf2/efficientdet-d0.engine`` using
``infer-spec.yaml``. Run on your chosen backend.
Annotated visualizations are written to images_annotated under the configured results
directory, and KITTI-format predictions are written to labels.