Deformable DETR#

Deformable DETR is an object-detection model that is included in the TAO. It supports the following tasks:

convert
train
evaluate
inference
export

These tasks can be invoked from the TAO Launcher using the following convention on the command-line:

tao model deformable_detr <sub_task> <args_per_subtask>

where, args_per_subtask are the command-line arguments required for a given subtask. Each subtask is explained in detail in the following sections.

Data Input for Deformable DETR#

Deformable DETR expects directories of images for training or validation and annotated JSON files in COCO format.

Note

The category_id from your COCO JSON file should start from 1 because 0 is set as a background class. In addition, dataset.num_classes should be set to max class_id + 1. For instance, even though there are only 80 classes used in COCO, the largest class_id is 90, so dataset.num_classes should be set to 91.

Sharding the Data (Optional)#

Note

Sharding is not necessary if the annotation is already in JSON format and your dataset is smaller than the COCO dataset. This subtask also assumes that your dataset is in KITTI format.

For a large dataset, you can optionally use convert to shard the dataset into smaller chunks to reduce the memory burden. In this process, KITTI-based annotations are converted into smaller sharded JSON files, similar to other object detection networks. Here is an example spec file for converting KITTI-based folders into multiple sharded JSON files.

input_source: /workspace/tao-experiments/data/sequence.txt
results_dir: /workspace/tao-experiments/sharded
image_dir_name: images
label_dir_name: labels
num_shards: 32
num_partitions: 1
mapping_path: /path/to/your_category_mapping

The details of each parameter are summarized in the table below:

Parameter	Data Type	Default	Description	Supported Values
`input_source`	string	None	The `.txt` file listing data sources
`results_dir`	string	None	The output directory where sharded JSON files will be stored
`image_dir_name`	string	None	The relative path to the directory containing images from the path listed in the `input_source` `.txt` file
`label_dir_name`	string	None	The relative path to the directory containing JSON data from the path listed in the `input_source` `.txt` file
`num_shards`	unsigned int	32	The number of shards per partition	>0
`num_partitions`	unsigned int	1	The number of partitions in the data	>0
`mapping_path`	string	None	The path to a JSON file containing the class mapping

The category mapping should contain mapping of your dataset and be in reverse alphabetical order. The default mapping is shown below:

DEFAULT_TARGET_CLASS_MAPPING = {
  "Person": "person",
  "Person Group": "person",
  "Rider": "person",
  "backpack": "bag",
  "face": "face",
  "large_bag": "bag",
  "person": "person",
  "person group": "person",
  "person_group": "person",
  "personal_bag": "bag",
  "rider": "person",
  "rolling_bag": "bag",
  "rollingbag": "bag",
  "largebag": "bag",
  "personalbag": "bag"
}

The following example shows how to use the command:

tao model deformable_detr convert -e /path/to/spec.yaml

Creating an Experiment Spec File#

The training experiment spec file for Deformable DETR includes model, train, and dataset parameters. Here is an example spec file for training a Deformable DETR model with a resnet50 backbone on a COCO dataset.

dataset:
  train_data_sources:
    - image_dir: /path/to/coco/train2017/
      json_file: /path/to/coco/annotations/instances_train2017.json
  val_data_sources:
    - image_dir: /path/to/coco/val2017/
      json_file: /path/to/coco/annotations/instances_val2017.json
  num_classes: 91
  batch_size: 4
  workers: 8
  augmentation:
    scales: [480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800]
    input_mean: [0.485, 0.456, 0.406]
    input_std: [0.229, 0.224, 0.225]
    horizontal_flip_prob: 0.5
    train_random_resize: [400, 500, 600]
    train_random_crop_min: 384
    train_random_crop_max: 600
    random_resize_max_size: 1333
    test_random_resize: 800
model:
  pretrained_model_path: /path/to/your-pretrained-backbone-model
  backbone: resnet_50
  train_backbone: True
  num_feature_levels: 4
  dec_layers: 6
  enc_layers: 6
  num_queries: 300
  with_box_refine: True
  dropout_ratio: 0.3
train:
  optim:
    lr: 0.0002
    lr_backbone: 0.00002
    lr_linear_proj_mult: 0.1
    momentum: 0.9
    weight_decay: 0.0001
    lr_scheduler: MultiStep
    lr_decay: 0.1
    lr_steps: [40]
    optimizer: AdamW
  num_epochs: 10
  checkpoint_interval: 5
  validation_interval: 5
  clip_grad_norm: 0.1
  precision: fp32
  distributed_strategy: ddp
  activation_checkpoint: True
  num_gpus: 1
  gpu_ids: [0]
  num_nodes: 1
  seed: 1234

Parameter	Data Type	Default	Description	Supported Values
`model`	dict config	–	The configuration of the model architecture
`dataset`	dict config	–	The configuration of the dataset
`train`	dict config	–	The configuration of the training task
`evaluate`	dict config	–	The configuration of the evaluation task
`inference`	dict config	–	The configuration of the inference task
`encryption_key`	string	None	The encryption key to encrypt and decrypt model files
`results_dir`	string	/results	The directory where experiment results are saved
`export`	dict config	–	The configuration of the ONNX export task
`gen_trt_engine`	dict config	–	The configuration of the TensorRT generation task. Only used in tao deploy

model#

The model parameter provides options to change the Deformable DETR architecture.

model:
  pretrained_model_path: /path/to/your-resnet50-pretrained-model
  backbone: resnet_50
  train_backbone: True
  num_feature_levels: 4
  dec_layers: 6
  enc_layers: 6
  num_queries: 300
  with_box_refine: True
  dropout_ratio: 0.3

Parameter	Datatype	Default	Description	Supported Values
`pretrained_backbone_path`	string	None	The optional path to the pretrained backbone file	string to the path
`backbone`	string	resnet_50	The backbone name of the model. The GCViT and ResNet 50 backbones are supported.	resnet_50, gc_vit_xxtiny, gc_vit_xtiny, gc_vit_tiny, gc_vit_small, gc_vit_base, gc_vit_large
`train_backbone`	bool	True	A flag specifying whether to train the backbone or not	True/False
`num_feature_levels`	unsigned int	4	The number of feature levels to use in the model	1,2,3,4
`return_interm_indices`	int list	[1, 2, 3, 4]	The index of feature levels to use in the model. The length must match `num_feature_levels`.	[0, 1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3], [1, 2], [1]
`dec_layers`	unsigned int	6	The number of decoder layers in the transformer	>0
`enc_layers`	unsigned int	6	The number of encoder layers in the transformer	>0
`num_queries`	unsigned int	300	The number of queries	>0
`dim_feedforward`	unsigned int	1024	The dimension of the feedforward network	>0
`num_select`	unsigned int	100	The number of top-K predictions selected during the post-process	>0
`with_box_refine`	bool	True	A flag specifying whether to enbable the Iterative Bounding Box Refinement	True, False
`dropout_ratio`	float	0.3	The probability to drop out hidden units	0.0 ~ 1.0
`cls_loss_coef`	float	2.0	The relative weight of the classification error in the matching cost	>0.0
`bbox_loss_coef`	float	5.0	The relative weight of the L1 error of the bounding box coordinates in the matching cost	>0.0
`giou_loss_coef`	float	2.0	The relative weight of the GIoU loss of the bounding box in the matching cost	>0.0
`focal_alpha`	float	0.25	The alpha in the focal loss	>0.0
`aux_loss`	bool	True	A flag specifying whether to use auxiliary decoding losses (loss at each decoder layer)	True, False

train#

The train parameter defines the hyperparameters of the training process.

train:
  optim:
    lr: 0.0002
    lr_backbone: 0.00002
    lr_linear_proj_mult: 0.1
    momentum: 0.9
    weight_decay: 0.0001
    lr_scheduler: MultiStep
    lr_decay: 0.1
    lr_steps: [40]
    optimizer: AdamW
  num_epochs: 10
  checkpoint_interval: 5
  validation_interval: 5
  clip_grad_norm: 0.1
  precision: fp32
  distributed_strategy: ddp
  activation_checkpoint: True
  num_gpus: 1
  gpu_ids: [0]
  num_nodes: 1
  seed: 1234

Parameter	Datatype	Default	Description	Supported Values
`num_gpus`	unsigned int	1	The number of GPUs to use for distributed training	>0
`gpu_ids`	List[int]	[0]	The indices of the GPU’s to use for distributed training
`seed`	unsigned int	1234	The random seed for random, numpy, and torch	>0
`num_epochs`	unsigned int	10	The total number of epochs to run the experiment	>0
`checkpoint_interval`	unsigned int	1	The epoch interval at which the checkpoints are saved	>0
`validation_interval`	unsigned int	1	The epoch interval at which the validation is run	>0
`resume_training_checkpoint_path`	string		The intermediate PyTorch Lightning checkpoint to resume training from
`results_dir`	string	/results/train	The directory to save training results
`optim`	dict config		The config for the optimizer, including the learning rate, learning scheduler, and weight decay	>0
`clip_grad_norm`	float	0.1	amount to clip the gradient by the L2 norm. A value of 0.0 specifies no clipping	>=0
`precision`	string	fp32	Specifying “fp16” enables precision training. Training with fp16 can help save GPU memory.	fp32, fp16
`distributed_strategy`	string	ddp	The multi-GPU training strategy. DDP (Distributed Data Parallel) and FSDP (Fully Sharded Data Parallel) are supported.	ddp, fsdp
`activation_checkpoint`	bool	True	A True value instructs train to recompute in backward pass to save GPU memory, rather than storing activations.	True, False
`pretrained_model_path`	string		Path to pretrained model checkpoint path to load for finetuning
`num_nodes`	unsigned int	1	The number of nodes. If the value is larger than 1, multi-node is enabled	>0
`freeze`	string list	[]	A list of layer names in the model to freeze (e.g. `["backbone", "transformer.encoder", "input_proj"]`)
`verbose`	bool	False	A flag specifying whether to print detailed learning-rate scaling from the optimizer	True, False

optim#

The optim parameter defines the config for the optimizer in training, including the learning rate, learning scheduler, and weight decay.

optim:
  lr: 0.0002
  lr_backbone: 0.00002
  lr_linear_proj_mult: 0.1
  momentum: 0.9
  weight_decay: 0.0001
  lr_scheduler: MultiStep
  lr_decay: 0.1
  lr_steps: [40]
  optimizer: AdamW

Parameter	Datatype	Default	Description	Supported Values
`lr`	float	2e-4	The initial learning rate for training the model, excluding the backbone	>0.0
`lr_backbone`	float	2e-5	The initial learning rate for training the backbone	>0.0
`lr_linear_proj_mult`	float	0.1	The initial learning rate for training the linear projection layer	>0.0
`momentum`	float	0.9	The momentum for the AdamW optimizer	>0.0
`weight_decay`	float	1e-4	The weight decay coefficient	>0.0
`lr_scheduler`	string	MultiStep	The learning scheduler. Two schedulers are provided: * `MultiStep` : Decrease the `lr` by `lr_decay` from `lr_steps`; * `StepLR` : Decrease the `lr` by `lr_decay` at every `lr_step_size`;	MultiStep/StepLR
`lr_decay`	float	0.1	The decreasing factor for the learning rate scheduler	>0.0
`lr_steps`	int list	[40]	The steps to decrease the learning rate for the `MultiStep` scheduler	int list
`lr_step_size`	unsigned int	40	The steps to decrease the learning rate for the `StepLR` scheduler	>0
`lr_monitor`	string	val_loss	The monitor value for the `AutoReduce` scheduler	val_loss/train_loss
`optimizer`	string	AdamW	The optimizer use during training	AdamW/SGD

dataset#

The dataset parameter defines the dataset source, training batch size, and augmentation.

dataset:
  train_data_sources:
    - image_dir: /path/to/coco/images/train2017/
      json_file: /path/to/coco/annotations/instances_train2017.json
  val_data_sources:
    - image_dir: /path/to/coco/images/val2017/
      json_file: /path/to/coco/annotations/instances_val2017.json
  test_data_sources:
    image_dir: /path/to/coco/images/val2017/
    json_file: /path/to/coco/annotations/instances_val2017.json
  infer_data_sources:
    image_dir: /path/to/coco/images/val2017/
    classmap: /path/to/coco/annotations/coco_classmap.txt
  num_classes: 91
  batch_size: 4
  workers: 8

Parameter	Datatype	Default	Description	Supported Values
`train_data_sources`	list dict		The training data sources: * `image_dir` : The directory that contains the training images * `json_file` : The path of the JSON file, which uses training-annotation COCO format
`val_data_sources`	list dict		The validation data sources: * `image_dir` : The directory that contains the validation images * `json_file` : The path of the JSON file, which uses validation-annotation COCO format
`test_data_sources`	dict		The test data sources for evaluation: * `image_dir` : The directory that contains the test images * `json_file` : The path of the JSON file, which uses test-annotation COCO format
`infer_data_sources`	dict		The infer data sources for inference: * `image_dir` : The directory that contains the inference images * `classmap` : The path of the `.txt` file that contains class names
`augmentation`	dict config		The parameters to define the augmentation method
`num_classes`	unsigned int	91	The number of classes in the training data	>0
`batch_size`	unsigned int	4	The batch size for training and validation	>0
`workers`	unsigned int	8	The number of parallel workers processing data	>0
`train_sampler`	string	default_sampler	The minibatch sampling method. Non-default sampling methods can be enabled for multi-node jobs. This config doesn’t have any effect if `dataset_type` isn’t set to default	default_sampler, non_uniform_sampler, uniform_sampler
`dataset_type`	string	serialized	If set to `default`, we follow the standard `CocoDetection` dataset structure from the torchvision which loads COCO annotation in every subprocess. This leads to redudant copy of data and can cause RAM to explod if `workers` is high. If set to `serialized`, the data is serialized through pickle and `torch.Tensor` that allows the data to be shared across subprocess. As a result, RAM usage can be greatly improved.	serialized, default

augmentation#

The augmentation parameter contains hyperparameters for augmentation.

augmentation:
  scales: [480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800]
  input_mean: [0.485, 0.456, 0.406]
  input_std: [0.229, 0.224, 0.225]
  horizontal_flip_prob: 0.5
  train_random_resize: [400, 500, 600]
  train_random_crop_min: 384
  train_random_crop_max: 600
  random_resize_max_size: 1333
  test_random_resize: 800

Parameter	Datatype	Default	Description	Supported Values
`scales`	int list	[480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800]	A list of sizes to perform random resize.
`input_mean`	float list	[0.485, 0.456, 0.406]	The input mean for RGB frames: `(input - mean) / std`	float list / size=1 or 3
`input_std`	float list	[0.229, 0.224, 0.225]	The input standard deviation for RGB frames: `(input - mean) / std`	float list / size=1 or 3
`horizontal_flip_prob`	float	0.5	The probability for horizonal flip during training	>=0
`train_random_resize`	int list	[400, 500, 600]	A list of sizes to perform random resize for training data	int list
`train_random_crop_min`	unsigned int	384	The minimum random crop size for training data	>0
`train_random_crop_max`	unsigned int	600	The maximum random crop size for training data	>0
`random_resize_max_size`	unsigned int	1333	The maximum random resize size for training data	>0
`test_random_resize`	unsigned int	800	The random resize size for test data	>0
`fixed_padding`	bool	True	A flag specifying whether to resize the image (with no padding) to `(sorted(scales[-1]), random_resize_max_size)` to prevent a CPU memory leak.	True/False
`fixed_random_crop`	unsigned int		A flag to enable Large Scale Jittering, which is used for ViT backbones. The resulting image resolution is fixed to `fixed_random_crop`.	Divisible by 32

Training the Model#

Use the following command to run Deformable DETR training:

tao model deformable_detr train [-h] -e <experiment_spec_file>
                          [results_dir=<global_results_dir>]
                          [model.<model_option>=<model_option_value>]
                          [dataset.<dataset_option>=<dataset_option_value>]
                          [train.<train_option>=<train_option_value>]
                          [train.gpu_ids=<gpu indices>]
                          [train.num_gpus=<number of gpus>]

Required Arguments#

The only required argument is the path to the experiment spec:

-e, --experiment_spec: The experiment specification file to set up the training experiment

Optional Arguments#

You can set optional arguments to override the option values in the experiment spec file.

-h, --help: Show this help message and exit.
model.<model_option>: The model options.
dataset.<dataset_option>: The dataset options.
train.<train_option>: The train options.
train.optim.<optim_option>: The optimizer options

Note

For training, evaluation, and inference, we expose 2 variables for each respective task: num_gpus and gpu_ids, which default to 1 and [0], respectively. If both are passed, but inconsistent, for example num_gpus = 1, gpu_ids = [0, 1], then they are modified to follow the setting with more GPUs, for example num_gpus = 1 -> num_gpus = 2.

Checkpointing and Resuming Training#

At every train.checkpoint_interval, a PyTorch Lightning checkpoint is saved. It is called model_epoch_<epoch_num>.pth. These are saved in train.results_dir, like so:

$ ls /results/train

'model_epoch_000.pth'
'model_epoch_001.pth'
'model_epoch_002.pth'
'model_epoch_003.pth'
'model_epoch_004.pth'

The latest checkpoint will also be saved as dd_model_latest.pth. Training will automatically resume from dd_model_latest.pth if it exists in train.results_dir. This will be superseded by train.resume_training_checkpoint_path if it is provided.

The major implication of this logic is that, if you wish to trigger fresh training from scratch, either

Specify a new, empty results directory (Recommended), or
Remove the latest checkpoint from the results directory

Optimizing Resource for training Deformable DETR#

Training Deformable DETR requires strong GPUs (e.g. V100/A100) with at least 15GB of VRAM and a lot of CPU memory to be trained on a standard dataset like COCO. In this section, we outline some of the strategies you can use to launch training with only limited resources.

Optimize GPU Memory#

There are various ways to optimize GPU memory usage. One obvious trick is to reduce dataset.batch_size. However, this can cause your training to take longer than usual. Hence, we recommend setting below configurations in order to optimize GPU consumption.

Set train.precision to fp16 to enable automatic mixed precision training. This can reduce your GPU memory usage by 50%.
Set train.activation_checkpoint to True to enable activation checkpointing. By recomputing the activations instead of caching them into memory, the memory usage can be improved.
Set train.distributed_strategy to fsdp to enable Fully Sharded Data Parallel training. This will share gradient calculation across different processes to help reduce GPU memory.
Try using more lightweight backbones like gc_vit_xxtiny or freeze the backbone through setting model.train_backbone to False.
Try changing the augmentation resolution in dataset.augmentation depending on your dataset.

Optimize CPU Memory#

To speed up data loading, it is a common practice to set high number of workers to spawn multiple processes. However, this can cause your CPU memory to become Out of Memory if the size of your annotation file is very large. Hence, we recommend setting below configurations in order to optimize CPU consumption.

Set dataset.dataset_type to serialized so that the COCO-based annotation data can be shared across different subprocesses.
Set dataset.augmentation.fixed_padding to True so that images are padded before the batch formulation. Due to random resize and random crop augmentation during training, the resulting image resolution after transform can vary across images. Such variable image resolutions can cause memory leak and the CPU memory to slowly stacks up until it becomes Out of Memory in the middle of training. This is the limitation of PyTorch so we advise setting fixed_padding to True to help stablize the CPU memory usage.

Evaluating the Model#

evaluate#

The evaluate parameter defines the hyperparameters of the evaluate process.

evaluate:
  checkpoint: /path/to/model.pth
  conf_threshold: 0.0

Parameter	Datatype	Default	Description	Supported Values
`checkpoint`	string		Path to PyTorch model to evaluate
`results_dir`	string	/results/evaluate	The directory to save evaluation results
`num_gpus`	unsigned int	1	The number of GPUs to use for distributed evaluation	>0
`gpu_ids`	List[int]	[0]	The indices of the GPU’s to use for distributed evaluation
`trt_engine`	string		Path to TensorRT model to evaluate. Should be only used with tao deploy
`conf_threshold`	float	0.0	Confidence threshold to filter predictions	>=0

To run evaluation with a Deformable DETR model, use this command:

tao model deformable_detr evaluate [-h] -e <experiment_spec>
                          evaluate.checkpoint=<model to be evaluated>
                          [evaluate.<evaluate_option>=<evaluate_option_value>]
                          [evaluate.gpu_ids=<gpu indices>]
                          [evaluate.num_gpus=<number of gpus>]

Required Arguments#

-e, --experiment_spec: The experiment spec file to set up the evaluation experiment
evaluate.checkpoint: The .pth model to be evaluated.

Optional Arguments#

evaluate.<evaluate_option>: The evaluate options.

Running Inference with an Deformable DETR Model#

inference#

The inference parameter defines the hyperparameters of the inference process.

inference:
  checkpoint: /path/to/model.pth
  conf_threshold: 0.5
  color_map:
    person: red
    car: blue

Parameter	Datatype	Default	Description	Supported Values
`checkpoint`	string		Path to PyTorch model to inference
`results_dir`	string	/results/inference	The directory to save inference results
`num_gpus`	unsigned int	1	The number of GPUs to use for distributed inference	>0
`gpu_ids`	List[int]	[0]	The indices of the GPU’s to use for distributed inference
`trt_engine`	string		Path to TensorRT model to inference. Should be only used with tao deploy
`conf_threshold`	float	0.5	Confidence threshold to filter predictions	>=0
`color_map`	dict		Color map of the bounding boxes for each class	string dict

The inference tool for Deformable DETR models can be used to visualize bboxes and generate frame-by- frame KITTI format labels on a directory of images.

tao model deformable_detr inference [-h] -e <experiment spec file>
                          inference.checkpoint=<model to be inferenced>
                          [inference.<inference_option>=<inference_option_value>]
                          [inference.gpu_ids=<gpu indices>]
                          [inference.num_gpus=<number of gpus>]

Required Arguments#

-e, --experiment_spec: The experiment spec file to set up the inference experiment
inference.checkpoint: The .pth model to inference.

Optional Arguments#

inference.<inference_option>: The inference options.

Exporting the Model#

export#

The export parameter defines the hyperparameters for the export process.

export:
  checkpoint: /path/to/model.pth
  onnx_file: /path/to/model.onnx
  on_cpu: False
  opset_version: 12
  input_channel: 3
  input_width: 960
  input_height: 544
  batch_size: -1

Parameter	Datatype	Default	Description	Supported Values
`checkpoint`	string		The path to PyTorch model to export
`onnx_file`	string		The path to the `.onnx` file
`on_cpu`	bool	True	If this value is True, the DMHA module will be exported as standard pytorch. If this value is False, the module will be exported using the TRT Plugin.	True, False
`opset_version`	unsigned int	12	The opset version of the exported ONNX	>0
`input_channel`	unsigned int	3	The input channel size. Only the value 3 is supported.	3
`input_width`	unsigned int	960	The input width	>0
`input_height`	unsigned int	544	The input height	>0
`batch_size`	unsigned int	-1	The batch size of the ONNX model. If this value is set to -1, the export uses dynamic batch size.	>=-1

tao model deformable_detr export [-h] -e <experiment spec file>
                          export.checkpoint=<model to export>
                          export.onnx_file=<onnx path>
                          [export.<export_option>=<export_option_value>]

Required Arguments#

-e, --experiment_spec: The path to an experiment spec file
export.checkpoint: The .pth model to export.
export.onnx_file: The path where the .etlt or .onnx model is saved.

Optional Arguments#

export.<export_option>: The export options.

TensorRT engine generation, validation, and int8 calibration#

For deployment, please refer to TAO Deploy documentation.

Deploying to DeepStream#

Refer to the Integrating a Deformable DETR Model page for more information about deploying a Deformable DETR model to DeepStream.