RT-DETR#
RT-DETR is an object-detection model that is included in the TAO. It supports the following tasks:
train
evaluate
inference
export
distill
Each task is explained in detail in the following sections.
Note
Throughout this documentation, you will see references to
$EXPERIMENT_ID
and$DATASET_ID
in the FTMS Client sections.For instructions on creating a dataset using the remote client, see the Creating a dataset section in the Remote Client documentation.
For instructions on creating an experiment using the remote client, see the Creating an experiment section in the Remote Client documentation.
The spec format is YAML for TAO Launcher and JSON for FTMS Client.
File-related parameters, such as dataset paths or pretrained model paths, are required only for TAO Launcher and not for FTMS Client.
Data Input for RT-DETR#
RT-DETR expects directories of images for training or validation and annotated JSON files in COCO format.
Creating an Experiment Spec File#
The training experiment spec file for RT-DETR includes model
, train
, and dataset
parameters.
Here is an example spec file for training a RT-DETR model with a resnet50 backbone on a COCO dataset.
Use the following command to get an experiment spec file for RT-DETR:
SPECS=$(tao-client rtdetr get-spec --action train --job_type experiment --id $EXPERIMENT_ID)
dataset:
train_data_sources:
- image_dir: /path/to/dataset/images/images
json_file: /path/to/dataset/images/annotations.json
val_data_sources:
image_dir: /path/to/dataset/images_val/images
json_file: /path/to/dataset/images_val/annotations.json
test_data_sources:
image_dir: /path/to/dataset/images_val/images
json_file: /path/to/dataset/images_val/annotations.json
infer_data_sources:
image_dir:
- /path/to/dataset/images_val/images
classmap: /path/to/labels.txt
batch_size: 16
workers: 8
remap_mscoco_category: false
pin_memory: true
dataset_type: serialized
num_classes: 80
eval_class_ids: null
augmentation:
multi_scales:
- 480
- 512
- 544
- 576
- 608
- 640
- 672
- 704
- 736
- 768
- 800
train_spatial_size:
- 640
- 640
eval_spatial_size:
- 640
- 640
distortion_prob: 0.8
iou_crop_prob: 0.8
preserve_aspect_ratio: false
model:
backbone: resnet_50
train_backbone: True
pretrained_backbone_path: /path/to/pretrained/backbone.pth
return_interm_indices: [1, 2, 3]
dec_layers: 6
enc_layers: 1
num_queries: 300
train:
optim:
lr: 0.0002
lr_backbone: 0.00002
lr_linear_proj_mult: 0.1
momentum: 0.9
weight_decay: 0.0001
lr_scheduler: MultiStep
lr_decay: 0.1
lr_steps: [40]
optimizer: AdamW
num_epochs: 10
checkpoint_interval: 5
validation_interval: 5
clip_grad_norm: 0.1
precision: fp32
distributed_strategy: ddp
activation_checkpoint: True
num_gpus: 1
gpu_ids: [0]
num_nodes: 1
seed: 1234
Field |
value_type |
description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
|
string |
FALSE |
|||||
|
string |
/results |
FALSE |
||||
|
collection |
FALSE |
|||||
|
collection |
Configurable parameters to construct the model for a RT-DETR experiment. |
FALSE |
||||
|
collection |
Configurable parameters to construct the dataset for a RT-DETR experiment. |
FALSE |
||||
|
collection |
Configurable parameters to construct the trainer for a RT-DETR experiment. |
FALSE |
||||
|
collection |
Configurable parameters to construct the evaluator for a RT-DETR experiment. |
FALSE |
||||
|
collection |
Configurable parameters to construct the inferencer for a RT-DETR experiment. |
FALSE |
||||
|
collection |
Configurable parameters to construct the exporter for a RT-DETR experiment. |
FALSE |
||||
|
collection |
Configurable parameters to construct the TensorRT engine builder for a RT-DETR experiment. |
FALSE |
||||
|
collection |
Configurable parameters to construct the distiller for a RT-DETR experiment. |
FALSE |
model#
The model
parameter provides options to change the RT-DETR architecture.
model:
pretrained_backbone_path: /path/to/pretrained/backbone.pth
backbone: resnet_50
train_backbone: true
num_queries: 300
num_select: 300
num_feature_levels: 3
return_interm_indices:
- 1
- 2
- 3
feat_strides:
- 8
- 16
- 32
feat_channels:
- 256
- 256
- 256
use_encoder_idx:
- 2
hidden_dim: 256
nheads: 8
dropout_ratio: 0.0
enc_layers: 1
dim_feedforward: 1024
pe_temperature: 10000
expansion: 1.0
depth_mult: 1
enc_act: gelu
act: silu
dec_layers: 6
dn_number: 100
eval_idx: -1
vfl_loss_coef: 1.0
bbox_loss_coef: 5.0
giou_loss_coef: 2.0
alpha: 0.75
gamma: 2.0
aux_loss: true
loss_types:
- vfl
- boxes
backbone_names:
- backbone.0
linear_proj_names:
- reference_points
- sampling_offsets
distillation_loss_coef: 1.0
frozen_fm:
enabled: false
backbone: radio_v2-l
checkpoint: /path/to/pretrained/radio_v2-l.pth
Field |
value_type |
description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
pretrained_backbone_path |
string |
[Optional] Path to a pretrained backbone file. |
|
||||
backbone |
string |
Backbone name of the model. TAO implementation of RT-DETR support ResNet, EfficientViT, FAN, and ConvNext v1/v2. |
resnet_50 |
convnext_tiny, convnext_small, convnext_base, convnext_large, convnext_xlarge, fan_tiny, fan_small, fan_base, fan_large |
|
||
resnet_18, resnet_34, resnet_50, resnet_101, convnextv2_nano, convnextv2_tiny, convnextv2_base, convnextv2_large, convnextv2_huge |
|
||||||
train_backbone |
bool |
Flag to set backbone weights as trainable or frozen.
When set to |
|
|
|||
num_queries |
int |
Number of queries. |
300 |
1 |
inf |
|
|
num_select |
int |
Number of top-K predictions selected during post-processing. |
300 |
1 |
|
||
num_feature_levels |
int |
Number of feature levels to use in the model. |
3 |
1 |
4 |
|
|
return_interm_indices |
list |
Index of feature levels to use in the model. The length must
match |
[1, 2, 3] |
|
|||
feat_strides |
list |
Stride used as grid size of positional embedding at each encoder layer. |
[8, 16, 32] |
|
|||
feat_channels |
list |
Feature channel sizes in decoder. |
[256, 256, 256] |
|
|||
use_encoder_idx |
list |
Index of multi-scale backbone features to pass to encoder. |
[2] |
|
|||
hidden_dim |
int |
Dimension of the hidden units. |
256 |
|
|||
nheads |
int |
Number of heads. |
8 |
|
|||
dropout_ratio |
float |
Probability to drop hidden units. |
0.0 |
0.0 |
1.0 |
|
|
enc_layers |
int |
Number of encoder layers in the transformer. |
1 |
1 |
|
||
dim_feedforward |
int |
Dimension of the feedforward network. |
1024 |
1 |
|
||
pe_temperature |
int |
Temperature applied to the positional sine embedding. |
10000 |
1 |
inf |
|
|
expansion |
int |
Expansion raito for hidden dimension used in CSPRepLayer. |
1.0 |
0.0 |
inf |
|
|
depth_mult |
int |
Number of RegVGGBlock used in CSPRepLayer. |
1 |
1 |
inf |
|
|
enc_act |
string |
Activation used for the encoder. |
|
|
|||
act |
string |
Activation used for top-down FPN and bottom-up PAN. |
|
|
|||
dec_layers |
int |
Number of decoder layers in the transformer. |
6 |
1 |
|
||
dn_number |
int |
Number of denoising queries. |
100 |
0 |
inf |
|
|
eval_idx |
int |
Index of decoder layer to use for evaluation. By default, use the last decoder layer. |
-1 |
-1 |
inf |
|
|
vfl_loss_coef |
float |
Relative weight of the varifocal error in the matching cost. |
1.0 |
0.0 |
inf |
|
|
bbox_loss_coef |
float |
Relative weight of the L1 error of the bounding box coordinates in the matching cost. |
5.0 |
0.0 |
inf |
|
|
giou_loss_coef |
float |
The relative weight of the GIoU loss of the bounding box in the matching cost. |
2.0 |
0.0 |
inf |
|
|
alpha |
float |
Alpha value in the varifocal loss. |
0.75 |
|
|||
gamma |
float |
Gamma value in the varifocal loss. |
2.0 |
|
|||
aux_loss |
bool |
A flag specifying whether to use auxiliary decoding losses (loss at each decoder layer). |
|
|
|||
loss_types |
list |
Losses to be used during training. |
[ |
|
|||
backbone_names |
list |
Prefix of the tensor names corresponding to the backbone. |
[ |
|
|||
linear_proj_names |
list |
Linear projection layer names. |
|
|
|||
distillation_loss_coef |
float |
Coefficient for the distillation loss during distillation. |
1.0 |
|
|||
frozen_fm |
collection |
Configurable parameters used to construct the frozen foundation model. |
|
frozen_fm#
The frozen_fm
parameter provides options to change the Frozen RT-DETR (RT-DETR + a frozen foundation model) architecture.
Field |
value_type |
description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
|
bool |
Flag to set frozen foundation model as enabled or disabled. When set to True, the frozen foundation model will be enabled. |
True |
FALSE |
|||
|
string |
Name of the frozen foundation model. |
radio_v2-l |
radio_v2-b,radio_v2-l,radio_v2-h |
FALSE |
||
|
string |
The path to the pretrained frozen foundation model checkpoint. |
FALSE |
Note
The pretrained weights of the frozen foundation model can be found in the TAO Model Zoo.
train#
The train
parameter defines the hyperparameters of the training process.
train:
optim:
lr: 0.0002
lr_backbone: 0.00002
lr_linear_proj_mult: 0.1
momentum: 0.9
weight_decay: 0.0001
lr_scheduler: MultiStep
lr_decay: 0.1
lr_steps: [40]
optimizer: AdamW
num_epochs: 10
checkpoint_interval: 5
validation_interval: 5
clip_grad_norm: 0.1
precision: fp32
distributed_strategy: ddp
activation_checkpoint: True
num_gpus: 1
gpu_ids: [0]
num_nodes: 1
seed: 1234
Field |
value_type |
description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
num_gpus |
int |
Number of GPUs to run the training job. |
1 |
1 |
|
||
gpu_ids |
list |
List of GPU IDs to run the training on. The length of this
list must equal the number of GPUs in |
[0] |
|
|||
num_nodes |
int |
Number of nodes to run the training on. If >1, |
1 |
|
|||
seed |
int |
Seed for the initializer in PyTorch. If <0, disable fixed seed. |
1234 |
-1 |
inf |
|
|
cudnn |
collection |
|
|||||
num_epochs |
int |
Number of epochs to run the training. |
10 |
1 |
inf |
|
|
checkpoint_interval |
int |
Interval (in epochs) at which a checkpoint is to be saved. Helps resume training. |
1 |
1 |
|
||
validation_interval |
int |
Interval (in epochs) at which a evaluation is to be triggered on the validation dataset. |
1 |
1 |
|
||
resume_training_checkpoint_path |
string |
Path to the checkpoint at which to resume training. |
|
||||
results_dir |
string |
Path to the location where all the assets generated from a task are stored. |
|
||||
freeze |
list |
List of layer names to freeze. Example: |
[] |
|
|||
pretrained_model_path |
string |
Path to a pretrained RT-DETR model to initialize the current training from. |
|
||||
clip_grad_norm |
float |
Amount to clip the gradient by L2 Norm. A value of 0.0 specifies no clipping. |
0.1 |
|
|||
is_dry_run |
bool |
Whether to run the trainer in Dry Run mode. This is a good way to validate the spec file and run a sanity check on the trainer without actually initializing and running the trainer. |
|
|
|||
enable_ema |
bool |
Whether to enable Exponential Moving Average during training. |
|
|
|||
ema |
collection |
Hyper parameters to configure the Exponential Moving Average. |
|
||||
optim |
collection |
Hyper parameters to configure the optimizer. |
|
||||
precision |
string |
Precision to run the training on. |
fp32 |
bf16,fp32,fp16 |
|
||
distributed_strategy |
string |
The multi-GPU training strategy. DDP (Distributed Data Parallel) and Fully Sharded DDP are supported. |
ddp |
ddp,fsdp |
|
||
activation_checkpoint |
bool |
Whether training is to recompute in backward pass to save GPU memory, rather than storing activations. |
|
|
|||
verbose |
bool |
Whether to enable printing of detailed learning rate scaling from the optimizer. |
|
|
optim#
The optim
parameter defines the config for the optimizer in training, including the
learning rate, learning scheduler, and weight decay.
optim:
lr: 0.0002
lr_backbone: 0.00002
lr_linear_proj_mult: 0.1
momentum: 0.9
weight_decay: 0.0001
lr_scheduler: MultiStep
lr_decay: 0.1
lr_steps: [40]
optimizer: AdamW
Field |
value_type |
description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
|
string |
Type of optimizer used to train the network. |
AdamW |
AdamW,SGD |
FALSE |
||
|
string |
The metric value to be monitored for the |
val_loss |
val_loss,train_loss |
FALSE |
||
|
float |
The initial learning rate for training the model, excluding the backbone |
0.0001 |
TRUE |
|||
|
float |
The initial learning rate for training the backbone |
1e-05 |
TRUE |
|||
|
float |
The momentum for the AdamW optimizer |
0.9 |
TRUE |
|||
|
float |
The weight decay coefficient |
0.0001 |
TRUE |
|||
|
string |
The learning scheduler: * MultiStep : Decrease the lr by lr_decay from lr_steps * StepLR : Decrease the lr by lr_decay at every lr_step_size. |
MultiStep |
MultiStep,StepLR |
FALSE |
||
|
list |
The steps at which the learning rate must be decreased. This is applicable only with the MultiStep LR. |
[1000] |
FALSE |
|||
|
int |
The number of steps to decrease the learning rate in the StepLR |
1000 |
TRUE |
|||
|
float |
The decreasing factor for the learning rate scheduler |
0.1 |
TRUE |
|||
|
int |
The number of steps to perform linear learning rate warm-up |
0 |
0 |
inf |
FALSE |
dataset#
The dataset
parameter defines the dataset source, training batch size, and
augmentation.
dataset:
train_data_sources:
- image_dir: /path/to/coco/images/train2017/
json_file: /path/to/coco/annotations/instances_train2017.json
val_data_sources:
image_dir: /path/to/coco/images/val2017/
json_file: /path/to/coco/annotations/instances_val2017.json
test_data_sources:
image_dir: /path/to/coco/images/val2017/
json_file: /path/to/coco/annotations/instances_val2017.json
infer_data_sources:
image_dir: /path/to/coco/images/val2017/
classmap: /path/to/coco/annotations/coco_classmap.txt
num_classes: 80
batch_size: 4
workers: 8
Field |
value_type |
description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
train_data_sources |
list |
The list of data sources for training:
|
[{‘image_dir’: ‘’, ‘json_file’: ‘’}] |
|
|||
val_data_sources |
collection |
The list of data sources for validation:
|
{‘image_dir’: ‘’, ‘json_file’: ‘’} |
|
|||
test_data_sources |
collection |
The data source for testing:
|
{‘image_dir’: ‘’, ‘json_file’: ‘’} |
|
|||
infer_data_sources |
collection |
The data source for inference:
|
{‘image_dir’: [‘’], ‘classmap’: ‘’} |
|
|||
batch_size |
int |
Batch size for training and validation. |
4 |
1 |
inf |
|
|
workers |
int |
Number of parallel workers processing data. |
8 |
1 |
inf |
|
|
remap_mscoco_category |
bool |
Enables mapping of MSCOCO 91 classes to 80. Only
required if you are directly training using
the original COCO annotation files. For
a custom dataset, set this value to |
|
|
|||
pin_memory |
bool |
Enables the dataloader to allocate pagelocked memory for faster data transfer between the CPU and GPU. |
|
|
|||
dataset_type |
string |
If set to default, follow the standard |
serialized`` |
|
|
||
num_classes |
int |
The number of classes in the training data |
80 |
1 |
inf |
|
|
eval_class_ids |
list |
IDs of the classes for evaluation. |
[1] |
|
|||
augmentation |
collection |
Configuration parameters for data augmentation |
|
augmentation#
The augmentation
parameter contains hyperparameters for augmentation.
augmentation:
multi_scales:
- 480
- 512
- 544
- 576
- 608
- 640
- 672
- 704
- 736
- 768
- 800
train_spatial_size:
- 640
- 640
eval_spatial_size:
- 640
- 640
distortion_prob: 0.8
iou_crop_prob: 0.8
preserve_aspect_ratio: false
Field |
value_type |
description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
multi_scales |
list |
A list of sizes to perform random resize. |
[480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800] |
|
|||
train_spatial_size |
list |
Input resolution to run evaluation during training. This is in the [h, w] order. |
[640, 640] |
|
|||
eval_spatial_size |
list |
Input resolution to run evaluation during validation and testing. This is in the [h, w] order. |
[640, 640] |
|
|||
distortion_prob |
float |
The probability for RandomPhotometricDistort |
0.8 |
0.0 |
1.0 |
|
|
iou_crop_prob |
float |
The probability for RandomIoUCrop |
0.8 |
0.0 |
1.0 |
|
|
preserve_aspect_ratio |
bool |
Flag to enable resize with preserving the aspect ratio. |
|
|
Training the Model#
Use the following command to run RT-DETR training:
TRAIN_JOB_ID=$(tao-client rtdetr experiment-run-action --action train --id $EXPERIMENT_ID --specs "$SPECS")
.. include:: ../../../excerpts/multi_node_training_ftms.rst
tao model rtdetr train [-h] -e <experiment_spec_file>
[results_dir=<global_results_dir>]
[model.<model_option>=<model_option_value>]
[dataset.<dataset_option>=<dataset_option_value>]
[train.<train_option>=<train_option_value>]
[train.gpu_ids=<gpu indices>]
[train.num_gpus=<number of gpus>]
Required Arguments
-e, --experiment_spec_file
: The path to the experiment spec file.
Optional Arguments
-h, --help
: Show this help message and exit.model.<model_option>
: The model options.dataset.<dataset_option>
: The dataset options.train.<train_option>
: The train options.train.optim.<optim_option>
: The optimizer options
Note
For training, evaluation, and inference, we expose 2 variables for each respective task: num_gpus
and gpu_ids
, which
default to 1
and [0]
, respectively. If both are passed, but inconsistent, for example num_gpus = 1
,
gpu_ids = [0, 1]`, then they are modified to follow the setting with more GPUs, for example num_gpus = 1 -> num_gpus = 2
.
In some cases, you may encounter an issue with multi-GPU training resulting in a segmentation fault. You may circumvent this by setting the OMP_NUM_THREADS enviroment variable to 1. Depending upon your model of execution, you may use the following methods to set this variable
CLI Launcher
You may set this env variable by adding the following fields to the Envs field of your ~/.tao_mounts.json
file as mentioned in bullet 3
in this section
{
"Envs": [
{
"variable": "OMP_NUM_THREADSR",
"value": "1"
}
]
}
Docker
You may set environment variables in the docker by setting the -e
flag in the docker command line.
docker run -it --rm --gpus all \
-e OMP_NUM_THREADS=1 \
-v /path/to/local/mount:/path/to/docker/mount nvcr.io/nvidia/tao/tao-toolkit:5.5.0-pyt <model> train -e
Checkpointing and Resuming Training
At every train.checkpoint_interval
, a PyTorch Lightning checkpoint is saved. It is called model_epoch_<epoch_num>.pth
.
These are saved in train.results_dir
, like so:
$ ls /results/train
'model_epoch_000.pth'
'model_epoch_001.pth'
'model_epoch_002.pth'
'model_epoch_003.pth'
'model_epoch_004.pth'
Note
You may resume a previously aborted training job by setting the
train.resume_training_checkpoint_path
to the path of the intermediate checkpoint file. The checkpoint files
must follow the model_epoch_*.pth
or model_epoch*-EMA.pth
format. You must use the *-EMA.pth
file
if your training spec has EMA enabled.
You may set this parameter by providing the corresponding flag over command line.
TRAIN_JOB_ID=$(tao-client rtdetr job-resume --job $TRAIN_JOB_ID --action train --id $EXPERIMENT_ID --specs "$SPECS")
tao model rtdetr train -e <experiment_spec_file> train.resume_training_checkpoint_path=<model_epoch_001.pth>
The major implication of this logic is that, if you wish to trigger fresh training from scratch, either
Specify a new, empty results directory (Recommended), or
Remove the latest checkpoint from the results directory
Evaluating the Model#
evaluate#
The evaluate
parameter defines the hyperparameters of the evaluate process.
evaluate:
checkpoint: /path/to/model.pth
conf_threshold: 0.0
Field |
value_type |
description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
checkpoint |
string |
??? |
|
||||
results_dir |
string |
|
|||||
nput_width |
int |
Width of the input image tensor. |
1 |
|
|||
input_height |
int |
Height of the input image tensor. |
1 |
|
|||
trt_engine |
string |
Path to the TensorRT engine to be used for evaluation. This only works with |
|
||||
conf_threshold |
float |
The value of the confidence threshold to be used when filtering out the final list of boxes. |
0.0 |
|
To run evaluation with a RT-DETR model, use this command:
EVAL_JOB_ID=$(tao-client rtdetr experiment-run-action --action evaluate --id $EXPERIMENT_ID --parent_job_id $TRAIN_JOB_ID --specs "$SPECS")
tao model rtdetr evaluate [-h] -e <experiment_spec>
evaluate.checkpoint=<model to be evaluated>
[evaluate.<evaluate_option>=<evaluate_option_value>]
[evaluate.gpu_ids=<gpu indices>]
[evaluate.num_gpus=<number of gpus>]
Required Arguments
The following arguments are required.
-e, --experiment_spec
: The experiment spec file to set up the evaluation experimentevaluate.checkpoint
: The.pth
model to be evaluated.
Optional Arguments
The following arguments are optional to run the command.
evaluate.<evaluate_option>
: The evaluate options.
Running Inference with an RT-DETR Model#
inference#
The inference
parameter defines the hyperparameters of the inference process.
inference:
checkpoint: /path/to/model.pth
conf_threshold: 0.5
color_map:
person: red
car: blue
Field |
value_type |
description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
checkpoint |
string |
??? |
|
||||
results_dir |
string |
|
|||||
trt_engine |
string |
Path to the TensorRT engine to be used for evaluation. This only works with |
|
||||
color_map |
collection |
Class-wise dictionary with colors to render boxes. |
|
||||
conf_threshold |
float |
The value of the confidence threshold to be used when filtering out the final list of boxes. |
0.5 |
|
|||
is_internal |
bool |
Flag to render with internal directory structure. |
|
|
|||
input_width |
int |
Width of the input image tensor. |
960 |
32 |
|
||
input_height |
int |
Height of the input image tensor. |
544 |
32 |
|
||
outline_width |
int |
Width in pixels of the bounding box outline. |
3 |
1 |
|
The inference tool for RT-DETR models can be used to visualize bboxes and generate frame-by- frame KITTI format labels on a directory of images.
INFER_JOB_ID=$(tao-client rtdetr experiment-run-action --action inference --id $EXPERIMENT_ID --parent_job_id $TRAIN_JOB_ID --specs "$SPECS")
tao model rtdetr inference [-h] -e <experiment spec file>
inference.checkpoint=<model to be inferenced>
[inference.<inference_option>=<inference_option_value>]
[inference.gpu_ids=<gpu indices>]
[inference.num_gpus=<number of gpus>]
Required Arguments
The following arguments are required to run the command.
-e, --experiment_spec
: The experiment spec file to set up the inference experimentinference.checkpoint
: The.pth
model to inference.
Optional Arguments
The following arguments are optional to run the command.
inference.<inference_option>
: The inference options.
Distilling the Model#
distill#
The distill
parameter defines the hyperparameters for the distillation process.
distill:
teacher:
backbone: convnext_large
train_backbone: False
num_queries: 300
num_select: 300
num_feature_levels: 3
return_interm_indices:
- 1
- 2
- 3
feat_strides:
- 8
- 16
- 32
hidden_dim: 256
nheads: 8
dropout_ratio: 0.0
enc_layers: 1
dim_feedforward: 1024
use_encoder_idx:
- 2
pe_temperature: 10000
expansion: 1.0
depth_mult: 1
enc_act: gelu
act: silu
dec_layers: 6
dn_number: 100
feat_channels:
- 256
- 256
- 256
eval_idx: -1
vfl_loss_coef: 1.0
bbox_loss_coef: 5.0
giou_loss_coef: 2.0
alpha: 0.75
gamma: 2.0
clip_max_norm: 0.1
aux_loss: true
loss_types:
- vfl
- boxes
backbone_names:
- backbone.0
linear_proj_names:
- reference_points
- sampling_offsets
pretrained_teacher_model_path: /path/to/teacher/model_epoch_070.pth
bindings:
- teacher_module_name: 'srcs'
student_module_name: 'srcs'
criterion: IOU
weight: 20
Field |
value_type |
description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
teacher |
collection |
Configurable parameters to construct the teacher model. (Same as the model config) |
|
||||
pretrained_teacher_model_path |
string |
Path to the pre-trained teacher model. |
|
||||
bindings |
list dict |
The list of bindings between teacher and student to use for calculating distill loss:
|
|
Note
We recommend using “IOU” as the criterion and teacher_module_name/student_module_name
as “srcs” for distillation.
total_loss = distillation_loss_coef * distillation_loss + other RTDETR losses, where distillation_loss = sum(binding_loss)
DISTILL_JOB_ID=$(tao-client rtdetr experiment-run-action --action distill --id $EXPERIMENT_ID --parent_job_id $TRAIN_JOB_ID --specs "$SPECS")
tao model rtdetr distill [-h] -e <experiment spec file>
Required Arguments
The following arguments are required to run the command.
-e, --experiment_spec
: The path to an experiment spec file
Exporting the Model#
export#
The export
parameter defines the hyperparameters for the export process.
export:
checkpoint: /path/to/model.pth
onnx_file: /path/to/model.onnx
on_cpu: False
opset_version: 12
input_channel: 3
input_width: 640
input_height: 640
batch_size: -1
Field |
value_type |
description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
checkpoint |
string |
Path to the checkpoint file to run export. |
??? |
|
|||
onnx_file |
string |
Path to the onnx model file. |
??? |
|
|||
on_cpu |
bool |
Flag to export CPU compatible model. |
|
|
|||
input_channel |
int |
Number of channels in the input Tensor. |
3 |
3 |
|
||
input_width |
int |
Width of the input image tensor. |
960 |
32 |
|
||
input_height |
int |
Height of the input image tensor. |
544 |
32 |
|
||
opset_version |
int |
Operator set version of the ONNX model used to generate the TensorRT engine. |
17 |
1 |
|
||
batch_size |
int |
The batch size of the input Tensor for the
engine. A value of |
-1 |
-1 |
|
||
verbose |
bool |
Flag to enable verbose TensorRT logging. |
|
|
Note
When you export a RT-DETR model with frozen_fm
enabled, the .onnx
file has a static batch size of 1.
EXPORT_JOB_ID=$(tao-client rtdetr experiment-run-action --action export --id $EXPERIMENT_ID --parent_job_id $TRAIN_JOB_ID --specs "$SPECS")
tao model rtdetr export [-h] -e <experiment spec file>
export.checkpoint=<model to export>
export.onnx_file=<onnx path>
[export.<export_option>=<export_option_value>]
Required Arguments
The following arguments are required to run the command.
-e, --experiment_spec
: The path to an experiment spec fileexport.checkpoint
: The.pth
model to export.export.onnx_file
: The path where the.etlt
or.onnx
model is saved.
Optional Arguments
The following arguments are optional to run the command.
export.<export_option>
: The export options.
TensorRT engine generation, validation, and int8 calibration#
For deployment, please refer to TAO Deploy documentation.
Deploying to DeepStream#
Refer to the Integrating a RT-DETR Model page for more information about deploying a RT-DETR model to DeepStream.