NVIDIA TAO Toolkit v4.0.1
NVIDIA TAO Release 4.0.1

DeformableDETR

DeformableDETR is an object-detection model that is included in the TAO Toolkit. It supports the following tasks:

  • convert

  • train

  • evaluate

  • inference

  • export

These tasks can be invoked from the TAO Toolkit Launcher using the following convention on the command-line:

Copy
Copied!
            

tao deformable_detr <sub_task> <args_per_subtask>

where, args_per_subtask are the command-line arguments required for a given subtask. Each subtask is explained in detail in the following sections.

DeformableDETR expects directories of images for training or validation and annotated JSON files in COCO format.

Sharding the Data

Note

Sharding is not necessary if the annotation is already in JSON format and your dataset is smaller than the COCO dataset.

For a large dataset, you can optionally use convert to shard the dataset into smaller chunks to reduce the memory burden. In this process, KITTI-based annotations are converted into smaller sharded JSON files, similar to other object detection networks. Here is an example spec file for converting KITTI-based folders into multiple sharded JSON files.

Copy
Copied!
            

input_source: /workspace/tao-experiments/data/sequence.txt output_dir: /workspace/tao-experiments/sharded image_dir_name: images label_dir_name: labels num_shards: 32 num_partitions: 1

The details of each parameter are summarized in the table below:

Parameter

Data Type

Default

Description

Supported Values

input_source

string

None

The .txt file listing data sources

output_dir

string

None

The output directory where sharded JSON files will be stored

image_dir_name

string

None

The relative path to the directory containing images from the path listed in input_source .txt file

label_dir_name

string

None

The relative path to the directory containing JSON data from the path listed in input_source .txt file

num_shards

unsigned int

32

The number of shards per partition

>0

num_partitions

unsigned int

1

The number of partitions in the data

>0

The following example shows how to use the command:

Copy
Copied!
            

tao deformable_detr convert -e /path/to/spec.yaml


The training experiment spec file for DeformableDETR includes model_config, train_config, and dataset_config parameters. Here is an example spec file for training a DeformableDETR model with a resnet50 backbone on a COCO dataset.

Copy
Copied!
            

dataset_config: train_data_sources: - image_dir: /path/to/coco/train2017/ json_file: /path/to/coco/annotations/instances_train2017.json val_data_sources: - image_dir: /path/to/coco/val2017/ json_file: /path/to/coco/annotations/instances_val2017.json num_classes: 91 batch_size: 2 workers: 8 augmentation_config: scales: [480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800] input_mean: [0.485, 0.456, 0.406] input_std: [0.229, 0.224, 0.225] horizontal_flip_prob: 0.5 train_random_resize: [400, 500, 600] train_random_crop_min: 384 train_random_crop_max: 600 random_resize_max_size: 1333 test_random_resize: 800 model_config: pretrained_backbone_path: /path/to/your-pretrained-backbone-model backbone: resnet50 train_backbone: True num_feature_levels: 4 dec_layers: 6 enc_layers: 6 num_queries: 300 with_box_refine: True dropout_ratio: 0.3 train_config: optim: lr_backbone: 2e-5 lr: 2e-4 lr_steps: [10, 20, 30, 40] momentum: 0.9 epochs: 50

Parameter

Data Type

Default

Description

Supported Values

model_config

dict config

–

The configuration of the model architecture

train_config

dict config

–

The configuration of the training process

dataset_config

dict config

–

The configuration of the dataset

num_gpus

unsigned int

1

The number of GPUs to use

>0

num__nodes

unsigned int

1

The number of nodes. If the value is larger than 1, multi-node is enabled.

>0

encryption_key

string

None

The encryption key to encrypt and decrypt model files

output_dir

string

None

The directory where experiment results are saved

resume_training_checkpoint_path

string

None

The intermediate checkpoint to resume training from

val_interval

unsigned int

1

The number of training epochs that should run per validation

>0

clip_grad_norm

float

0.1

The amount to clip the gradient by the L2 norm. A value of 0.0 specifies no clipping.

>=0

conf_threshold

float

0.5

A threshold for confidence scores

>=0

model_config

The model_config parameter provides options to change the DeformableDETR architecture.

Copy
Copied!
            

model_config: pretrained_backbone_path: /path/to/your-resnet50-pretrained-model backbone: resnet50 train_backbone: True num_feature_levels: 4 dec_layers: 6 enc_layers: 6 num_queries: 300 with_box_refine: True dropout_ratio: 0.3

Parameter

Datatype

Default

Description

Supported Values

pretrained_backbone_path

string

None

The optional path to the pretrained backbone file

string to the path

backbone

string

resnet50

The backbone name of the model. Currently, the only supported backbone is resnet50.

resnet50

train_backbone

bool

True

A flag specifying whether to train the backbone or not.

True/False

num_feature_levels

unsigned int

4

The number of feature levels to use in the model

1,2,3,4

dec_layers

unsigned int

6

The number of decoder layers in the transformer

>0

enc_layers

unsigned int

6

The number of encoder layers in the transformer

>0

num_queries

unsigned int

300

The number of queries

>0

with_box_refine

bool

True

A flag specifying whether to enbable the Iterative Bounding Box Refinement

True/False

dropout_ratio

float

0.3

The probability to drop out hidden units

0.0 ~ 1.0

cls_loss_coef

float

2.0

The relative weight of the classification error in the matching cost

>0.0

bbox_loss_coef

float

5.0

The relative weight of the L1 error of the bounding box coordinates in the matching cost

>0.0

giou_loss_coef

float

2.0

The relative weight of the giou loss of the bounding box in the matching cost

>0.0

focal_alpha

float

0.25

The alpha in the focal loss

>0.0

aux_loss

bool

True

A flag specifying whether to use auxiliary decoding losses (loss at each decoder layer)

True/False

train_config

The train_config parameter defines the hyperparameters of the training process.

Copy
Copied!
            

train_config: optim: lr: 0.0001 lr_backbone: 0.00001 momentum: 0.9 weight_decay: 0.0001 lr_scheduler: MultiStep lr_steps: [10, 20, 30, 40] lr_decay: 0.1 epochs: 50 checkpoint_interval: 1

Parameter

Datatype

Default

Description

Supported Values

optim

dict config

The config for the optimizer, including the learning rate, learning scheduler, and weight decay

>0

epochs

unsigned int

50

The total number of epochs to run the experiment

>0

checkpoint_interval

unsigned int

51

The interval at which the checkpoints are saved

>0

optim

The optim parameter defines the config for the optimizer in training, including the learning rate, learning scheduler, and weight decay.

Copy
Copied!
            

optim: lr: 0.0001 lr_backbone: 0.00001 momentum: 0.9 weight_decay: 0.0001 lr_scheduler: MultiStep lr_steps: [10, 20, 30, 40] lr_decay: 0.1

Parameter

Datatype

Default

Description

Supported Values

lr

float

1e-4

The initial learning rate for training the model, excluding the backbone

>0.0

lr_backbone

float

1e-5

The initial learning rate for training the backbone

>0.0

lr_linear_proj_mult

float

0.1

The initial learning rate for training the linear projection layer

>0.0

momentum

float

0.9

The momentum for the AdamW optimizer

>0.0

weight_decay

float

1e-4

The weight decay coefficient

>0.0

lr_scheduler

string

MultiStep

The learning scheduler. Two schedulers are provided:
* MultiStep : Decrease the lr by lr_decay from lr_steps;
* StepLR : Decrease the lr by lr_decay at every lr_step_size;

MultiStep/StepLR

lr_decay

float

0.1

The decreasing factor for the learning rate scheduler

>0.0

lr_steps

int list

[10]

The steps to decrease the learning rate for the MultiStep scheduler

int list

lr_step_size

unsigned int

10

The steps to decrease the learning rate for the StepLR scheduler

>0

lr_monitor

string

val_loss

The monitor value for the AutoReduce scheduler

val_loss/train_loss

optimizer

string

AdamW

The optimizer use during training

AdamW/SGD

dataset_config

The dataset_config parameter defines the dataset source, training batch size, and augmentation.

Copy
Copied!
            

dataset_config: train_data_sources: - image_dir: /path/to/coco/images/train2017/ json_file: /path/to/coco/annotations/instances_train2017.json val_data_sources: - image_dir: /path/to/coco/images/val2017/ json_file: /path/to/coco/annotations/instances_val2017.json num_classes: 91 batch_size: 2 workers: 8

Parameter

Datatype

Default

Description

Supported Values

train_data_sources

list dict

The training data sources
* image_dir : The directory that contains the training images
* json_file : The path of the JSON file in training annotation COCO format

val_data_sources

list dict

The validation data sources
* image_dir : The directory that contains the validation images
* json_file : The path of the JSON file in validation annotation COCO format

num_classes

unsigned int

4

The number of classes in the training data

>0

batch_size

unsigned int

32

The batch size for training and validation

>0

workers

unsigned int

8

The number of parallel workers processing data

>0

train_sampler

string

default_sampler

The minibatch sampling method. Non-default sampling methods can be enabled for multi-node jobs

default_sampler/non_uniform_sampler/uniform_sampler

augmentation_config

dict config

The parameters to define the augmentation method

augmentation_config

The augmentation_config parameter contains hyperparameters for augmentation.

Copy
Copied!
            

augmentation_config: scales: [480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800] input_mean: [0.485, 0.456, 0.406] input_std: [0.229, 0.224, 0.225] horizontal_flip_prob: 0.5 train_random_resize: [400, 500, 600] train_random_crop_min: 384 train_random_crop_max: 600 random_resize_max_size: 1333 test_random_resize: 800

Parameter

Datatype

Default

Description

Supported Values

scales

int list

[480, 512, 544, 576,
608, 640, 672, 704,
736, 768, 800]

A list of sizes to perform random resize.

input_mean

float list

[0.485, 0.456, 0.406]

The input mean for RGB frames: (input - mean) / std

float list / size=1 or 3

input_std

float list

[0.229, 0.224, 0.225]

The input std for RGB frames: (input - mean) / std

float list / size=1 or 3

horizontal_flip_prob

float

0.5

Specifies whether to center crop the images in validation.

train_random_resize

int list

[400, 500, 600]

A list of sizes to perform random resize for train data

train_random_crop_min

unsigned int

384

The minimum random crop size for training data

train_random_crop_max

unsigned int

600

The maximum random crop size for training data

random_resize_max_size

unsigned int

1333

The maximum random resize size for train data

test_random_resize

unsigned int

800

The random resize size for test data

To train a DeformableDETR model, use this command:

Copy
Copied!
            

tao deformable_detr train [-h] -e <experiment_spec> [-r <results_dir>] [-k <key>]

Required Arguments

  • -e, --experiment_spec: The experiment specification file to set up the training experiment

Optional Arguments

  • -r, --results_dir: The path to the folder where the experiment outputs should be written. If not specified, the output_dir from the spec file will be used.

  • -k, --key: A user-specific encoding key to save or load a .tlt model. If not specified, the encryption_key from the spec file will be used.

  • --gpus: The number of GPUs to run training

  • --num_nodes: The number of nodes to run training. If this value is larger than 1, distributed multi-node training is enabled.

  • -h, --help: Show this help message and exit.

Sample Usage

Here’s an example of the train command:

Copy
Copied!
            

tao deformable_detr train -e /path/to/spec.yaml


To run evaluation with a DeformableDETR model, use this command:

Copy
Copied!
            

tao deformable_detr evaluate [-h] -e <experiment_spec> -k <key> model_path=<model to be evaluated> output_dir=<results directory>

Required Arguments

  • -e, --experiment_spec: The experiment spec file to set up the evaluation experiment. This should be the same as the training specification file.

  • -k, --key: A user-specific encoding key to save or load a .tlt model.

  • model_path: The .tlt model to be evaluated.

  • output_dir: The directory where the evaluation result is stored.

Sample Usage

Here’s an example of using the evaluate command:

Copy
Copied!
            

tao deformable_detr evaluate -e /path/to/spec.yaml -k $YOUR_KEY model_path=/path/to/model.tlt output_dir=/path/to/results/


The inference tool for DeformableDETR models can be used to visualize bboxes and generate frame-by- frame KITTI format labels on a directory of images.

Copy
Copied!
            

tao deformable_detr inference [-h] -e <experiment spec file> -k <key> model_path=<model to be evaluated> output_dir=<results directory>

Required Arguments

  • -e, --experiment_spec: The path to an experiment spec file

  • -k, --key: A user-specific encoding key to save or load a .tlt model

  • model_path: The .tlt model to be used

  • output_dir: The directory where the inference result is stored

Sample Usage

Here’s an example of using the inference command:

Copy
Copied!
            

tao deformable_detr inference -e /path/to/spec.yaml -k $YOUR_KEY model_path=/path/to/model.tlt output_dir=/path/to/results/


The export

Copy
Copied!
            

tao deformable_detr export [-h] -e <experiment spec file> -k <key> model_path=<trained tlt model to be xported> output_file=<etlt path>

Required Arguments

  • -e, --experiment_spec: The path to an experiment spec file

  • -k, --key: A user-specific encoding key to save or load a .tlt model

  • model_path: The .tlt model to be exported

  • output_file: The etlt file to be stored.

Sample Usage

Here’s an example of using the export command:

Copy
Copied!
            

tao deformable_detr export -e /path/to/spec.yaml -k $YOUR_KEY model_path=/path/to/model.tlt output_file=/path/to/model.etlt


Refer to the Integrating a Deformable DETR Model page for more information about deploying a Deformable DETR model to DeepStream.

© Copyright 2023, NVIDIA.. Last updated on Aug 2, 2023.