Deformable DETR
Deformable DETR is an object-detection model that is included in the TAO Toolkit. It supports the following tasks:
convert
train
evaluate
inference
export
These tasks can be invoked from the TAO Toolkit Launcher using the following convention on the command-line:
tao model deformable_detr <sub_task> <args_per_subtask>
where, args_per_subtask
are the command-line arguments required for a given subtask. Each
subtask is explained in detail in the following sections.
Deformable DETR expects directories of images for training or validation and annotated JSON files in COCO format.
The category_id
from your COCO JSON file should start from 1 because 0 is set as a background class.
In addition, dataset.num_classes
should be set to max class_id + 1
. For instance, even though
there are only 80 classes used in COCO, the largest class_id
is 90, so dataset.num_classes
should be set to 91.
Sharding the Data (Optional)
Sharding is not necessary if the annotation is already in JSON format and your dataset is smaller than the COCO dataset. This subtask also assumes that your dataset is in KITTI format.
For a large dataset, you can optionally use convert
to shard the dataset into smaller chunks
to reduce the memory burden. In this process, KITTI-based annotations are converted into smaller sharded
JSON files, similar to other object detection networks. Here is an example spec file for converting
KITTI-based folders into multiple sharded JSON files.
input_source: /workspace/tao-experiments/data/sequence.txt
results_dir: /workspace/tao-experiments/sharded
image_dir_name: images
label_dir_name: labels
num_shards: 32
num_partitions: 1
mapping_path: /path/to/your_category_mapping
The details of each parameter are summarized in the table below:
Parameter | Data Type | Default | Description | Supported Values |
input_source |
string | None | The .txt file listing data sources |
|
results_dir |
string | None | The output directory where sharded JSON files will be stored | |
image_dir_name |
string | None | The relative path to the directory containing images from the path listed in the input_source .txt file |
|
label_dir_name |
string | None | The relative path to the directory containing JSON data from the path listed in the input_source .txt file |
|
num_shards |
unsigned int | 32 | The number of shards per partition | >0 |
num_partitions |
unsigned int | 1 | The number of partitions in the data | >0 |
mapping_path |
string | None | The path to a JSON file containing the class mapping |
The category mapping should contain mapping of your dataset and be in reverse alphabetical order. The default mapping is shown below:
DEFAULT_TARGET_CLASS_MAPPING = {
"Person": "person",
"Person Group": "person",
"Rider": "person",
"backpack": "bag",
"face": "face",
"large_bag": "bag",
"person": "person",
"person group": "person",
"person_group": "person",
"personal_bag": "bag",
"rider": "person",
"rolling_bag": "bag",
"rollingbag": "bag",
"largebag": "bag",
"personalbag": "bag"
}
The following example shows how to use the command:
tao model deformable_detr convert -e /path/to/spec.yaml
The training experiment spec file for Deformable DETR includes model
, train
, and dataset
parameters.
Here is an example spec file for training a Deformable DETR model with a resnet50 backbone on a COCO dataset.
dataset:
train_data_sources:
- image_dir: /path/to/coco/train2017/
json_file: /path/to/coco/annotations/instances_train2017.json
val_data_sources:
- image_dir: /path/to/coco/val2017/
json_file: /path/to/coco/annotations/instances_val2017.json
num_classes: 91
batch_size: 4
workers: 8
augmentation:
scales: [480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800]
input_mean: [0.485, 0.456, 0.406]
input_std: [0.229, 0.224, 0.225]
horizontal_flip_prob: 0.5
train_random_resize: [400, 500, 600]
train_random_crop_min: 384
train_random_crop_max: 600
random_resize_max_size: 1333
test_random_resize: 800
model:
pretrained_model_path: /path/to/your-pretrained-backbone-model
backbone: resnet_50
train_backbone: True
num_feature_levels: 4
dec_layers: 6
enc_layers: 6
num_queries: 300
with_box_refine: True
dropout_ratio: 0.3
train:
optim:
lr_backbone: 2e-5
lr: 2e-4
lr_steps: [10, 20, 30, 40]
momentum: 0.9
num_epochs: 50
Parameter | Data Type | Default | Description | Supported Values |
model |
dict config | – | The configuration of the model architecture | |
train |
dict config | – | The configuration of the training task | |
dataset |
dict config | – | The configuration of the dataset | |
evaluate |
dict config | – | The configuration of the evaluation task | |
inference |
dict config | – | The configuration of the inference task | |
export |
dict config | – | The configuration of the ONNX export task | |
gen_trt_engine |
dict config | – | The configuration of the TensorRT generation task. Only used in tao deploy | |
encryption_key |
string | None | The encryption key to encrypt and decrypt model files | |
results_dir |
string | None | The directory where experiment results are saved |
model
The model
parameter provides options to change the Deformable DETR architecture.
model:
pretrained_model_path: /path/to/your-resnet50-pretrained-model
backbone: resnet_50
train_backbone: True
num_feature_levels: 4
dec_layers: 6
enc_layers: 6
num_queries: 300
with_box_refine: True
dropout_ratio: 0.3
Parameter | Datatype | Default | Description | Supported Values |
pretrained_backbone_path |
string | None | The optional path to the pretrained backbone file | string to the path |
|
string |
resnet_50 |
The backbone name of the model. The GCViT and ResNet 50 backbones are supported. |
resnet_50, gc_vit_xxtiny, |
train_backbone |
bool | True | A flag specifying whether to train the backbone or not | True/False |
num_feature_levels |
unsigned int | 4 | The number of feature levels to use in the model | 1,2,3,4 |
|
int list |
[1, 2, 3, 4] |
The index of feature levels to use in the model. The length must match |
[0, 1, 2, 3, 4], [1, 2, 3, 4], |
dec_layers |
unsigned int | 6 | The number of decoder layers in the transformer | >0 |
enc_layers |
unsigned int | 6 | The number of encoder layers in the transformer | >0 |
num_queries |
unsigned int | 300 | The number of queries | >0 |
dim_feedforward |
unsigned int | 1024 | The dimension of the feedforward network | >0 |
num_select |
unsigned int | 100 | The number of top-K predictions selected during the post-process | >0 |
with_box_refine |
bool | True | A flag specifying whether to enbable the Iterative Bounding Box Refinement | True, False |
dropout_ratio |
float | 0.3 | The probability to drop out hidden units | 0.0 ~ 1.0 |
cls_loss_coef |
float | 2.0 | The relative weight of the classification error in the matching cost | >0.0 |
bbox_loss_coef |
float | 5.0 | The relative weight of the L1 error of the bounding box coordinates in the matching cost | >0.0 |
giou_loss_coef |
float | 2.0 | The relative weight of the GIoU loss of the bounding box in the matching cost | >0.0 |
focal_alpha |
float | 0.25 | The alpha in the focal loss | >0.0 |
aux_loss |
bool | True | A flag specifying whether to use auxiliary decoding losses (loss at each decoder layer) | True, False |
train
The train
parameter defines the hyperparameters of the training process.
train:
optim:
lr: 0.0001
lr_backbone: 0.00001
momentum: 0.9
weight_decay: 0.0001
lr_scheduler: MultiStep
lr_steps: [10, 20, 30, 40]
lr_decay: 0.1
num_epochs: 50
checkpoint_interval: 1
precision: fp32
distributed_strategy: ddp
activation_checkpoint: True
num_gpus: 8
num_nodes: 1
Parameter | Datatype | Default | Description | Supported Values |
optim |
dict config | The config for the optimizer, including the learning rate, learning scheduler, and weight decay | >0 | |
num_epochs |
unsigned int | 50 | The total number of epochs to run the experiment | >0 |
checkpoint_interval |
unsigned int | 1 | The interval at which the checkpoints are saved | >0 |
validation_interval |
unsigned int | 1 | The epoch interval at which the validation is run | >0 |
clip_grad_norm |
float | 0.1 | amount to clip the gradient by the L2 norm. A value of 0.0 specifies no clipping | >=0 |
precision |
string | fp32 | Specifying “fp16” enables precision training. Training with fp16 can help save GPU memory. | fp32, fp16 |
distributed_strategy |
string | ddp | The multi-GPU training strategy. DDP (Distributed Data Parallel) and Sharded DDP are supported. | ddp, ddp_sharded |
activation_checkpoint |
bool | True | A True value instructs train to recompute in backward pass to save GPU memory, rather than storing activations. | True, False |
resume_training_checkpoint_path |
string | The intermediate PyTorch Lightning checkpoint to resume training from | ||
pretrained_model_path |
string | Path to pretrained model checkpoint path to load for finetuning | ||
num_gpus |
unsigned int | 1 | The number of GPUs to use | >0 |
num_nodes |
unsigned int | 1 | The number of nodes. If the value is larger than 1, multi-node is enabled | >0 |
freeze |
string list | [] | A list of layer names in the model to freeze (e.g. ["backbone", "transformer.encoder", "input_proj"] |
|
verbose |
bool | False | A flag specifying whether to print detailed learning-rate scaling from the optimizer | True, False |
optim
The optim
parameter defines the config for the optimizer in training, including the
learning rate, learning scheduler, and weight decay.
optim:
lr: 0.0002
lr_backbone: 0.00002
momentum: 0.9
weight_decay: 0.0001
lr_scheduler: MultiStep
lr_steps: [10, 20, 30, 40]
lr_decay: 0.1
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
lr |
float | 2e-4 | The initial learning rate for training the model, excluding the backbone | >0.0 |
lr_backbone |
float | 2e-5 | The initial learning rate for training the backbone | >0.0 |
lr_linear_proj_mult |
float | 0.1 | The initial learning rate for training the linear projection layer | >0.0 |
momentum |
float | 0.9 | The momentum for the AdamW optimizer | >0.0 |
weight_decay |
float | 1e-4 | The weight decay coefficient | >0.0 |
|
string |
MultiStep |
The learning scheduler. Two schedulers are provided: |
MultiStep/StepLR |
lr_decay |
float | 0.1 | The decreasing factor for the learning rate scheduler | >0.0 |
lr_steps |
int list | [40] | The steps to decrease the learning rate for the MultiStep scheduler |
int list |
lr_step_size |
unsigned int | 40 | The steps to decrease the learning rate for the StepLR scheduler |
>0 |
lr_monitor |
string | val_loss | The monitor value for the AutoReduce scheduler |
val_loss/train_loss |
optimizer |
string | AdamW | The optimizer use during training | AdamW/SGD |
dataset
The dataset
parameter defines the dataset source, training batch size, and
augmentation.
dataset:
train_data_sources:
- image_dir: /path/to/coco/images/train2017/
json_file: /path/to/coco/annotations/instances_train2017.json
val_data_sources:
- image_dir: /path/to/coco/images/val2017/
json_file: /path/to/coco/annotations/instances_val2017.json
test_data_sources:
image_dir: /path/to/coco/images/val2017/
json_file: /path/to/coco/annotations/instances_val2017.json
infer_data_sources:
image_dir: /path/to/coco/images/val2017/
classmap: /path/to/coco/annotations/coco_classmap.txt
num_classes: 91
batch_size: 4
workers: 8
Parameter | Datatype | Default | Description | Supported Values |
|
list dict |
|
The training data sources: |
|
|
list dict |
|
The validation data sources: |
|
|
dict |
|
The test data sources for evaluation: |
|
|
dict |
|
The infer data sources for inference: |
|
augmentation |
dict config | The parameters to define the augmentation method | ||
num_classes |
unsigned int | 91 | The number of classes in the training data | >0 |
batch_size |
unsigned int | 4 | The batch size for training and validation | >0 |
workers |
unsigned int | 8 | The number of parallel workers processing data | >0 |
|
string |
default_sampler |
The minibatch sampling method. Non-default sampling methods can be enabled for multi-node |
default_sampler, non_uniform_sampler, |
|
string |
serialized |
If set to |
serialized, default |
augmentation
The augmentation
parameter contains hyperparameters for augmentation.
augmentation:
scales: [480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800]
input_mean: [0.485, 0.456, 0.406]
input_std: [0.229, 0.224, 0.225]
horizontal_flip_prob: 0.5
train_random_resize: [400, 500, 600]
train_random_crop_min: 384
train_random_crop_max: 600
random_resize_max_size: 1333
test_random_resize: 800
Parameter | Datatype | Default | Description | Supported Values |
|
int list |
[480, 512, 544, 576, |
A list of sizes to perform random resize. |
|
input_mean |
float list | [0.485, 0.456, 0.406] | The input mean for RGB frames: (input - mean) / std |
float list / size=1 or 3 |
input_std |
float list | [0.229, 0.224, 0.225] | The input standard deviation for RGB frames: (input - mean) / std |
float list / size=1 or 3 |
horizontal_flip_prob |
float | 0.5 | The probability for horizonal flip during training | >=0 |
train_random_resize |
int list | [400, 500, 600] | A list of sizes to perform random resize for training data | int list |
train_random_crop_min |
unsigned int | 384 | The minimum random crop size for training data | >0 |
train_random_crop_max |
unsigned int | 600 | The maximum random crop size for training data | >0 |
random_resize_max_size |
unsigned int | 1333 | The maximum random resize size for training data | >0 |
test_random_resize |
unsigned int | 800 | The random resize size for test data | >0 |
|
bool |
True |
A flag specifying whether to resize the image (with no padding) to |
True/False |
|
unsigned int |
|
A flag to enable Large Scale Jittering, which is used for ViT backbones. |
Divisible by 32 |
To train a Deformable DETR model, use this command:
tao model deformable_detr train [-h] -e <experiment_spec>
[-r <results_dir>]
[-k <key>]
Required Arguments
-e, --experiment_spec
: The experiment specification file to set up the training experiment
Optional Arguments
-r, --results_dir
: The path to the folder where the experiment outputs should be written. If not specified, theresults_dir
from the spec file will be used.-k, --key
: A user-specific encoding key to save or load a.tlt
model. If not specified, the model checkpoint will not be encrypted.--gpus
: The number of GPUs to run training--num_nodes
: The number of nodes to run training. If this value is larger than 1, distributed multi-node training is enabled.-h, --help
: Show this help message and exit.
Sample Usage
Here’s an example of the train
command:
tao deformable_detr model train -e /path/to/spec.yaml
Optimizing Resource for training Deformable DETR
Training Deformable DETR requires strong GPUs (e.g. V100/A100) with at least 15GB of VRAM and a lot of CPU memory to be trained on a standard dataset like COCO. In this section, we outline some of the strategies you can use to launch training with only limited resources.
Optimize GPU Memory
There are various ways to optimize GPU memory usage. One obvious trick is to reduce dataset.batch_size
. However, this can cause your training to take longer than usual.
Hence, we recommend setting below configurations in order to optimize GPU consumption.
Set
train.precision
tofp16
to enable automatic mixed precision training. This can reduce your GPU memory usage by 50%.Set
train.activation_checkpoint
toTrue
to enable activation checkpointing. By recomputing the activations instead of caching them into memory, the memory usage can be improved.Set
train.distributed_strategy
toddp_sharded
to enabled Sharded DDP training. This will share gradient calculation across different processes to help reduce GPU memory.Try using more lightweight backbones like
gc_vit_xxtiny
or freeze the backbone through settingmodel.train_backbone
to False.Try changing the augmentation resolution in
dataset.augmentation
depending on your dataset.
Optimize CPU Memory
To speed up data loading, it is a common practice to set high number of workers to spawn multiple processes. However, this can cause your CPU memory to become Out of Memory if the size of your annotation file is very large. Hence, we recommend setting below configurations in order to optimize CPU consumption.
Set
dataset.dataset_type
toserialized
so that the COCO-based annotation data can be shared across different subprocesses.Set
dataset.augmentation.fixed_padding
to True so that images are padded before the batch formulation. Due to random resize and random crop augmentation during training, the resulting image resolution after transform can vary across images. Such variable image resolutions can cause memory leak and the CPU memory to slowly stacks up until it becomes Out of Memory in the middle of training. This is the limitation of PyTorch so we advise settingfixed_padding
to True to help stablize the CPU memory usage.
evaluate
The evaluate
parameter defines the hyperparameters of the evaluate process.
evaluate:
checkpoint: /path/to/model.pth
conf_threshold: 0.0
num_gpus: 1
Parameter | Datatype | Default | Description | Supported Values |
checkpoint |
string | Path to PyTorch model to evaluate | ||
trt_engine |
string | Path to TensorRT model to evaluate. Should be only used with tao deploy | ||
num_gpus |
unsigned int | 1 | The number of GPUs to use | >0 |
conf_threshold |
float | 0.0 | Confidence threshold to filter predictions | >=0 |
To run evaluation with a Deformable DETR model, use this command:
tao model deformable_detr evaluate [-h] -e <experiment_spec>
[-r <results_dir>]
[-k <key>]
evaluate.checkpoint=<model to be evaluated>
Required Arguments
-e, --experiment_spec
: The experiment spec file to set up the evaluation experiment.
Optional Arguments
-k, --key
: A user-specific encoding key to save or load a.tlt
model. If not specified, a.pth
model must be used.-r, --results_dir
: The directory where the evaluation result will be storedevaluate.checkpoint
: The.tlt
or.pth
model to be evaluated
Sample Usage
Here’s an example of using the evaluate
command:
tao model deformable_detr evaluate -e /path/to/spec.yaml -r /path/to/results/ evaluate.checkpoint=/path/to/model.pth
inference
The inference
parameter defines the hyperparameters of the inference process.
inference:
checkpoint: /path/to/model.pth
conf_threshold: 0.5
num_gpus: 1
color_map:
person: red
car: blue
Parameter | Datatype | Default | Description | Supported Values |
checkpoint |
string | Path to PyTorch model to inference | ||
trt_engine |
string | Path to TensorRT model to inference. Should be only used with tao deploy | ||
num_gpus |
unsigned int | 1 | The number of GPUs to use | >0 |
conf_threshold |
float | 0.5 | Confidence threshold to filter predictions | >=0 |
color_map |
dict | Color map of the bounding boxes for each class | string dict |
The inference tool for Deformable DETR models can be used to visualize bboxes and generate frame-by- frame KITTI format labels on a directory of images.
tao model deformable_detr inference [-h] -e <experiment spec file>
[-r <results_dir>]
[-k <key>]
inference.checkpoint=<model to be inferenced>
Required Arguments
-e, --experiment_spec
: The experiment spec file to set up the inference experiment
Optional Arguments
-k, --key
: A user-specific encoding key to save or load a.tlt
model. If not specified, a.pth
model must be used.-r, --results_dir
: The directory where the inference result will be storedinference.checkpoint
: The.tlt
or.pth
model for inference
Sample Usage
Here’s an example of using the inference
command:
tao model deformable_detr inference -e /path/to/spec.yaml -r /path/to/results/ inference.checkpoint=/path/to/model.pth
export
The export
parameter defines the hyperparameters for the export process.
export:
checkpoint: /path/to/model.pth
onnx_file: /path/to/model.onnx
on_cpu: False
opset_version: 12
input_channel: 3
input_width: 960
input_height: 544
batch_size: -1
Parameter | Datatype | Default | Description | Supported Values |
checkpoint |
string | The path to PyTorch model to export | ||
onnx_file |
string | The path to the .onnx file |
||
on_cpu |
bool | True | If this value is True, the DMHA module will be exported as standard pytorch. If this value is False, the module will be exported using the TRT Plugin. | True, False |
opset_version |
unsigned int | 12 | The opset version of the exported ONNX | >0 |
input_channel |
unsigned int | 3 | The input channel size. Only the value 3 is supported. | 3 |
input_width |
unsigned int | 960 | The input width | >0 |
input_height |
unsigned int | 544 | The input height | >0 |
batch_size |
unsigned int | -1 | The batch size of the ONNX model. If this value is set to -1, the export uses dynamic batch size. | >=-1 |
tao model deformable_detr export [-h] -e <experiment spec file>
[-r <results_dir>]
[-k <key>]
export.checkpoint=<model to export>
export.onnx_file=<onnx path>
Required Arguments
-e, --experiment_spec
: The path to an experiment spec file
Optional Arguments
-k, --key
: A user-specific encoding key to save or load a.tlt
model. If not specified, a.pth
model must be used.-r, --results_dir
: The directory where the inference result is storedexport.checkpoint
: The.tlt
or.pth
model to exportexport.onnx_file
: The path where the.etlt
or.onnx
model will be saved
Sample Usage
Here’s an example of using the export
command:
tao model deformable_detr export -e /path/to/spec.yaml export.checkpoint=/path/to/model.pth export.onnx_file=/path/to/model.onnx
For deployment, please refer to TAO Deploy documentation.
Refer to the Integrating a Deformable DETR Model page for more information about deploying a Deformable DETR model to DeepStream.