DeformableDETR - NVIDIA Docs

DeformableDETR is an object-detection model that is included in the TAO Toolkit. It supports the following tasks:

convert
train
evaluate
inference
export

These tasks can be invoked from the TAO Toolkit Launcher using the following convention on the command-line:

Copy
Copied!

            
            tao deformable_detr <sub_task> <args_per_subtask>

where, args_per_subtask are the command-line arguments required for a given subtask. Each subtask is explained in detail in the following sections.

Data Input for DeformableDETR

DeformableDETR expects directories of images for training or validation and annotated JSON files in COCO format.

Sharding the Data

Note

Sharding is not necessary if the annotation is already in JSON format and your dataset is smaller than the COCO dataset.

For a large dataset, you can optionally use convert to shard the dataset into smaller chunks to reduce the memory burden. In this process, KITTI-based annotations are converted into smaller sharded JSON files, similar to other object detection networks. Here is an example spec file for converting KITTI-based folders into multiple sharded JSON files.

Copy
Copied!

            
            input_source: /workspace/tao-experiments/data/sequence.txt
output_dir: /workspace/tao-experiments/sharded
image_dir_name: images
label_dir_name: labels
num_shards: 32
num_partitions: 1

The details of each parameter are summarized in the table below:

Parameter	Data Type	Default	Description	Supported Values
`input_source`	string	None	The `.txt` file listing data sources
`output_dir`	string	None	The output directory where sharded JSON files will be stored
`image_dir_name`	string	None	The relative path to the directory containing images from the path listed in `input_source` `.txt` file
`label_dir_name`	string	None	The relative path to the directory containing JSON data from the path listed in `input_source` `.txt` file
`num_shards`	unsigned int	32	The number of shards per partition	>0
`num_partitions`	unsigned int	1	The number of partitions in the data	>0

The following example shows how to use the command:

Copy
Copied!

            
            tao deformable_detr convert -e /path/to/spec.yaml

Creating an Experiment Spec File

The training experiment spec file for DeformableDETR includes model_config, train_config, and dataset_config parameters. Here is an example spec file for training a DeformableDETR model with a resnet50 backbone on a COCO dataset.

Copy
Copied!

            
            dataset_config:
  train_data_sources:
    - image_dir: /path/to/coco/train2017/
      json_file: /path/to/coco/annotations/instances_train2017.json
  val_data_sources:
    - image_dir: /path/to/coco/val2017/
      json_file: /path/to/coco/annotations/instances_val2017.json
  num_classes: 91
  batch_size: 2
  workers: 8
  augmentation_config:
    scales: [480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800]
    input_mean: [0.485, 0.456, 0.406]
    input_std: [0.229, 0.224, 0.225]
    horizontal_flip_prob: 0.5
    train_random_resize: [400, 500, 600]
    train_random_crop_min: 384
    train_random_crop_max: 600
    random_resize_max_size: 1333
    test_random_resize: 800
model_config:
  pretrained_backbone_path: /path/to/your-pretrained-backbone-model
  backbone: resnet50
  train_backbone: True
  num_feature_levels: 4
  dec_layers: 6
  enc_layers: 6
  num_queries: 300
  with_box_refine: True
  dropout_ratio: 0.3
train_config:
  optim:
    lr_backbone: 2e-5
    lr: 2e-4
    lr_steps: [10, 20, 30, 40]
    momentum: 0.9
  epochs: 50

Parameter	Data Type	Default	Description	Supported Values
`model_config`	dict config	–	The configuration of the model architecture
`train_config`	dict config	–	The configuration of the training process
`dataset_config`	dict config	–	The configuration of the dataset
`num_gpus`	unsigned int	1	The number of GPUs to use	>0
`num__nodes`	unsigned int	1	The number of nodes. If the value is larger than 1, multi-node is enabled.	>0
`encryption_key`	string	None	The encryption key to encrypt and decrypt model files
`output_dir`	string	None	The directory where experiment results are saved
`resume_training_checkpoint_path`	string	None	The intermediate checkpoint to resume training from
`val_interval`	unsigned int	1	The number of training epochs that should run per validation	>0
`clip_grad_norm`	float	0.1	The amount to clip the gradient by the L2 norm. A value of 0.0 specifies no clipping.	>=0
`conf_threshold`	float	0.5	A threshold for confidence scores	>=0

model_config

The model_config parameter provides options to change the DeformableDETR architecture.

Copy
Copied!

            
            model_config:
  pretrained_backbone_path: /path/to/your-resnet50-pretrained-model
  backbone: resnet50
  train_backbone: True
  num_feature_levels: 4
  dec_layers: 6
  enc_layers: 6
  num_queries: 300
  with_box_refine: True
  dropout_ratio: 0.3

Parameter	Datatype	Default	Description	Supported Values
`pretrained_backbone_path`	string	None	The optional path to the pretrained backbone file	string to the path
`backbone`	string	resnet50	The backbone name of the model. Currently, the only supported backbone is resnet50.	resnet50
`train_backbone`	bool	True	A flag specifying whether to train the backbone or not.	True/False
`num_feature_levels`	unsigned int	4	The number of feature levels to use in the model	1,2,3,4
`dec_layers`	unsigned int	6	The number of decoder layers in the transformer	>0
`enc_layers`	unsigned int	6	The number of encoder layers in the transformer	>0
`num_queries`	unsigned int	300	The number of queries	>0
`with_box_refine`	bool	True	A flag specifying whether to enbable the Iterative Bounding Box Refinement	True/False
`dropout_ratio`	float	0.3	The probability to drop out hidden units	0.0 ~ 1.0
`cls_loss_coef`	float	2.0	The relative weight of the classification error in the matching cost	>0.0
`bbox_loss_coef`	float	5.0	The relative weight of the L1 error of the bounding box coordinates in the matching cost	>0.0
`giou_loss_coef`	float	2.0	The relative weight of the giou loss of the bounding box in the matching cost	>0.0
`focal_alpha`	float	0.25	The alpha in the focal loss	>0.0
`aux_loss`	bool	True	A flag specifying whether to use auxiliary decoding losses (loss at each decoder layer)	True/False

train_config

The train_config parameter defines the hyperparameters of the training process.

Copy
Copied!

            
            train_config:
  optim:
    lr: 0.0001
    lr_backbone: 0.00001
    momentum: 0.9
    weight_decay: 0.0001
    lr_scheduler: MultiStep
    lr_steps: [10, 20, 30, 40]
    lr_decay: 0.1
  epochs: 50
  checkpoint_interval: 1

Parameter	Datatype	Default	Description	Supported Values
`optim`	dict config		The config for the optimizer, including the learning rate, learning scheduler, and weight decay	>0
`epochs`	unsigned int	50	The total number of epochs to run the experiment	>0
`checkpoint_interval`	unsigned int	51	The interval at which the checkpoints are saved	>0

optim

The optim parameter defines the config for the optimizer in training, including the learning rate, learning scheduler, and weight decay.

Copy
Copied!

            
            optim:
  lr: 0.0001
  lr_backbone: 0.00001
  momentum: 0.9
  weight_decay: 0.0001
  lr_scheduler: MultiStep
  lr_steps: [10, 20, 30, 40]
  lr_decay: 0.1

Parameter	Datatype	Default	Description	Supported Values
`lr`	float	1e-4	The initial learning rate for training the model, excluding the backbone	>0.0
`lr_backbone`	float	1e-5	The initial learning rate for training the backbone	>0.0
`lr_linear_proj_mult`	float	0.1	The initial learning rate for training the linear projection layer	>0.0
`momentum`	float	0.9	The momentum for the AdamW optimizer	>0.0
`weight_decay`	float	1e-4	The weight decay coefficient	>0.0
`lr_scheduler`	string	MultiStep	The learning scheduler. Two schedulers are provided: * `MultiStep` : Decrease the `lr` by `lr_decay` from `lr_steps`; * `StepLR` : Decrease the `lr` by `lr_decay` at every `lr_step_size`;	MultiStep/StepLR
`lr_decay`	float	0.1	The decreasing factor for the learning rate scheduler	>0.0
`lr_steps`	int list	[10]	The steps to decrease the learning rate for the `MultiStep` scheduler	int list
`lr_step_size`	unsigned int	10	The steps to decrease the learning rate for the `StepLR` scheduler	>0
`lr_monitor`	string	val_loss	The monitor value for the `AutoReduce` scheduler	val_loss/train_loss
`optimizer`	string	AdamW	The optimizer use during training	AdamW/SGD

dataset_config

The dataset_config parameter defines the dataset source, training batch size, and augmentation.

Copy
Copied!

            
            dataset_config:
  train_data_sources:
    - image_dir: /path/to/coco/images/train2017/
      json_file: /path/to/coco/annotations/instances_train2017.json
  val_data_sources:
    - image_dir: /path/to/coco/images/val2017/
      json_file: /path/to/coco/annotations/instances_val2017.json
  num_classes: 91
  batch_size: 2
  workers: 8

Parameter	Datatype	Default	Description	Supported Values
`train_data_sources`	list dict		The training data sources * `image_dir` : The directory that contains the training images * `json_file` : The path of the JSON file in training annotation COCO format
`val_data_sources`	list dict		The validation data sources * `image_dir` : The directory that contains the validation images * `json_file` : The path of the JSON file in validation annotation COCO format
`num_classes`	unsigned int	4	The number of classes in the training data	>0
`batch_size`	unsigned int	32	The batch size for training and validation	>0
`workers`	unsigned int	8	The number of parallel workers processing data	>0
`train_sampler`	string	default_sampler	The minibatch sampling method. Non-default sampling methods can be enabled for multi-node jobs	default_sampler/non_uniform_sampler/uniform_sampler
`augmentation_config`	dict config		The parameters to define the augmentation method

augmentation_config

The augmentation_config parameter contains hyperparameters for augmentation.

Copy
Copied!

            
            augmentation_config:
  scales: [480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800]
  input_mean: [0.485, 0.456, 0.406]
  input_std: [0.229, 0.224, 0.225]
  horizontal_flip_prob: 0.5
  train_random_resize: [400, 500, 600]
  train_random_crop_min: 384
  train_random_crop_max: 600
  random_resize_max_size: 1333
  test_random_resize: 800

Parameter	Datatype	Default	Description	Supported Values
`scales`	int list	[480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800]	A list of sizes to perform random resize.
`input_mean`	float list	[0.485, 0.456, 0.406]	The input mean for RGB frames: `(input - mean) / std`	float list / size=1 or 3
`input_std`	float list	[0.229, 0.224, 0.225]	The input std for RGB frames: `(input - mean) / std`	float list / size=1 or 3
`horizontal_flip_prob`	float	0.5	Specifies whether to center crop the images in validation.
`train_random_resize`	int list	[400, 500, 600]	A list of sizes to perform random resize for train data
`train_random_crop_min`	unsigned int	384	The minimum random crop size for training data
`train_random_crop_max`	unsigned int	600	The maximum random crop size for training data
`random_resize_max_size`	unsigned int	1333	The maximum random resize size for train data
`test_random_resize`	unsigned int	800	The random resize size for test data

Training the Model

To train a DeformableDETR model, use this command:

Copy
Copied!

            
            tao deformable_detr train [-h] -e <experiment_spec>
                          [-r <results_dir>]
                          [-k <key>]

Required Arguments

-e, --experiment_spec: The experiment specification file to set up the training experiment

Optional Arguments

-r, --results_dir: The path to the folder where the experiment outputs should be written. If not specified, the output_dir from the spec file will be used.
-k, --key: A user-specific encoding key to save or load a .tlt model. If not specified, the encryption_key from the spec file will be used.
--gpus: The number of GPUs to run training
--num_nodes: The number of nodes to run training. If this value is larger than 1, distributed multi-node training is enabled.
-h, --help: Show this help message and exit.

Sample Usage

Here’s an example of the train command:

Copy
Copied!

            
            tao deformable_detr train -e /path/to/spec.yaml

Evaluating the Model

To run evaluation with a DeformableDETR model, use this command:

Copy
Copied!

            
            tao deformable_detr evaluate [-h] -e <experiment_spec>
                             -k <key>
                             model_path=<model to be evaluated>
                             output_dir=<results directory>

Required Arguments

-e, --experiment_spec: The experiment spec file to set up the evaluation experiment. This should be the same as the training specification file.
-k, --key: A user-specific encoding key to save or load a .tlt model.
model_path: The .tlt model to be evaluated.
output_dir: The directory where the evaluation result is stored.

Sample Usage

Here’s an example of using the evaluate command:

Copy
Copied!

            
            tao deformable_detr evaluate -e /path/to/spec.yaml -k $YOUR_KEY model_path=/path/to/model.tlt output_dir=/path/to/results/

Running Inference with an DeformableDETR Model

The inference tool for DeformableDETR models can be used to visualize bboxes and generate frame-by- frame KITTI format labels on a directory of images.

Copy
Copied!

            
            tao deformable_detr inference [-h] -e <experiment spec file>
                              -k <key>
                              model_path=<model to be evaluated>
                              output_dir=<results directory>

Required Arguments

-e, --experiment_spec: The path to an experiment spec file
-k, --key: A user-specific encoding key to save or load a .tlt model
model_path: The .tlt model to be used
output_dir: The directory where the inference result is stored

Sample Usage

Here’s an example of using the inference command:

Copy
Copied!

            
            tao deformable_detr inference -e /path/to/spec.yaml -k $YOUR_KEY model_path=/path/to/model.tlt output_dir=/path/to/results/

Exporting the Model

The export

Copy
Copied!

            
            tao deformable_detr export [-h] -e <experiment spec file>
                            -k <key>
                            model_path=<trained tlt model to be xported>
                            output_file=<etlt path>

Required Arguments

-e, --experiment_spec: The path to an experiment spec file
-k, --key: A user-specific encoding key to save or load a .tlt model
model_path: The .tlt model to be exported
output_file: The etlt file to be stored.

Sample Usage

Here’s an example of using the export command:

Copy
Copied!

            
            tao deformable_detr export -e /path/to/spec.yaml -k $YOUR_KEY model_path=/path/to/model.tlt output_file=/path/to/model.etlt

TensorRT engine generation, validation, and int8 calibration

For deployment, please refer to TAO Deploy documentation.

Deploying to DeepStream

Refer to the Integrating a Deformable DETR Model page for more information about deploying a Deformable DETR model to DeepStream.