RT-DETR#

RT-DETR is an object-detection model that is included in the TAO. It supports the following tasks:

train
evaluate
inference
export
distill

Each task is explained in detail in the following sections.

Note

Throughout this documentation are references to $EXPERIMENT_ID and $DATASET_ID in the FTMS Client sections.
- For instructions on creating a dataset using the remote client, refer to the Creating a dataset section in the Remote Client documentation.
- For instructions on creating an experiment using the remote client, refer to the Creating an experiment section in the Remote Client documentation.
The spec format is YAML for TAO Launcher, and JSON for FTMS Client.
File-related parameters, such as dataset paths or pretrained model paths, are required only for TAO Launcher, not for FTMS Client.

Data Input for RT-DETR#

RT-DETR expects directories of images for training or validation and annotated JSON files in COCO format.

Creating an Experiment Spec File#

The training experiment spec file for RT-DETR includes model, train, and dataset parameters. Here is an example spec file for training a RT-DETR model with a resnet50 backbone on a COCO dataset.

TAO Client (v2 API)

Use the following command to get an experiment spec file for RT-DETR:

BASE_EXPERIMENT_ID=$(tao rtdetr list-base-experiments | jq -r '.[0].id')
SPECS=$(tao rtdetr get-job-schema --action train --base-experiment-id $BASE_EXPERIMENT_ID | jq -r '.default')

TAO Launcher

dataset:
  train_data_sources:
    - image_dir: /path/to/dataset/images/images
      json_file: /path/to/dataset/images/annotations.json
    val_data_sources:
      image_dir: /path/to/dataset/images_val/images
      json_file: /path/to/dataset/images_val/annotations.json
    test_data_sources:
      image_dir: /path/to/dataset/images_val/images
      json_file: /path/to/dataset/images_val/annotations.json
    infer_data_sources:
      image_dir:
      - /path/to/dataset/images_val/images
      classmap: /path/to/labels.txt
    batch_size: 16
    workers: 8
    remap_mscoco_category: false
    pin_memory: true
    dataset_type: serialized
    num_classes: 80
    eval_class_ids: null
    augmentation:
      multi_scales:
      - 480
      - 512
      - 544
      - 576
      - 608
      - 640
      - 672
      - 704
      - 736
      - 768
      - 800
      train_spatial_size:
      - 640
      - 640
      eval_spatial_size:
      - 640
      - 640
      distortion_prob: 0.8
      iou_crop_prob: 0.8
      preserve_aspect_ratio: false
  model:
    backbone: resnet_50
    train_backbone: True
    pretrained_backbone_path: /path/to/pretrained/backbone.pth
    return_interm_indices: [1, 2, 3]
    dec_layers: 6
    enc_layers: 1
    num_queries: 300
  train:
    optim:
      lr: 0.0002
      lr_backbone: 0.00002
      lr_linear_proj_mult: 0.1
      momentum: 0.9
      weight_decay: 0.0001
      lr_scheduler: MultiStep
      lr_decay: 0.1
      lr_steps: [40]
      optimizer: AdamW
    num_epochs: 10
    checkpoint_interval: 5
    validation_interval: 5
    clip_grad_norm: 0.1
    precision: fp32
    distributed_strategy: ddp
    activation_checkpoint: True
    num_gpus: 1
    gpu_ids: [0]
    num_nodes: 1
    seed: 1234

Field	value_type	description	default_value	automl_enabled
`encryption_key`	string			FALSE
`results_dir`	string		/results	FALSE
`wandb`	collection			FALSE
`model`	collection	Configurable parameters to construct the model for a RT-DETR experiment.		FALSE
`dataset`	collection	Configurable parameters to construct the dataset for a RT-DETR experiment.		FALSE
`train`	collection	Configurable parameters to construct the trainer for a RT-DETR experiment.		FALSE
`evaluate`	collection	Configurable parameters to construct the evaluator for a RT-DETR experiment.		FALSE
`inference`	collection	Configurable parameters to construct the inferencer for a RT-DETR experiment.		FALSE
`export`	collection	Configurable parameters to construct the exporter for a RT-DETR experiment.		FALSE
`gen_trt_engine`	collection	Configurable parameters to construct the TensorRT engine builder for a RT-DETR experiment.		FALSE
`distill`	collection	Configurable parameters to construct the distiller for a RT-DETR experiment.		FALSE

model#

The model parameter provides options to change the RT-DETR architecture.

model:
  pretrained_backbone_path: /path/to/pretrained/backbone.pth
  backbone: resnet_50
  train_backbone: true
  num_queries: 300
  num_select: 300
  num_feature_levels: 3
  return_interm_indices:
  - 1
  - 2
  - 3
  feat_strides:
  - 8
  - 16
  - 32
  feat_channels:
  - 256
  - 256
  - 256
  use_encoder_idx:
  - 2
  hidden_dim: 256
  nheads: 8
  dropout_ratio: 0.0
  enc_layers: 1
  dim_feedforward: 1024
  pe_temperature: 10000
  expansion: 1.0
  depth_mult: 1
  enc_act: gelu
  act: silu
  dec_layers: 6
  dn_number: 100
  eval_idx: -1
  vfl_loss_coef: 1.0
  bbox_loss_coef: 5.0
  giou_loss_coef: 2.0
  alpha: 0.75
  gamma: 2.0
  aux_loss: true
  loss_types:
  - vfl
  - boxes
  backbone_names:
  - backbone.0
  linear_proj_names:
  - reference_points
  - sampling_offsets
  distillation_loss_coef: 1.0
  frozen_fm:
    enabled: false
    backbone: radio_v2-l
    checkpoint: /path/to/pretrained/radio_v2-l.pth

Field	value_type	description	default_value	valid_min	valid_max	valid_options	automl_enabled
pretrained_backbone_path	string	[Optional] Path to a pretrained backbone file.					`false`
backbone	string	Backbone name of the model. TAO implementation of RT-DETR support ResNet, EfficientViT, FAN, and ConvNext v1/v2.	resnet_50			convnext_tiny, convnext_small, convnext_base, convnext_large, convnext_xlarge, fan_tiny, fan_small, fan_base, fan_large	`false`
backbone	string		resnet_50			resnet_18, resnet_34, resnet_50, resnet_101, convnextv2_nano, convnextv2_tiny, convnextv2_base, convnextv2_large, convnextv2_huge	`false`
train_backbone	bool	Flag to set backbone weights as trainable or frozen. When set to `False`, the backbone weights are frozen.	`True`				`false`
num_queries	int	Number of queries.	300	1	inf		`true`
num_select	int	Number of top-K predictions selected during post-processing.	300	1			`true`
num_feature_levels	int	Number of feature levels to use in the model.	3	1	4		`false`
return_interm_indices	list	Index of feature levels to use in the model. The length must match `num_feature_levels`.	[1, 2, 3]				`false`
feat_strides	list	Stride used as grid size of positional embedding at each encoder layer.	[8, 16, 32]				`false`
feat_channels	list	Feature channel sizes in decoder.	[256, 256, 256]				`false`
use_encoder_idx	list	Index of multi-scale backbone features to pass to encoder.	[2]				`false`
hidden_dim	int	Dimension of the hidden units.	256				`false`
nheads	int	Number of heads.	8				`false`
dropout_ratio	float	Probability to drop hidden units.	0.0	0.0	1.0		`false`
enc_layers	int	Number of encoder layers in the transformer.	1	1			`true`
dim_feedforward	int	Dimension of the feedforward network.	1024	1			`false`
pe_temperature	int	Temperature applied to the positional sine embedding.	10000	1	inf		`false`
expansion	int	Expansion raito for hidden dimension used in CSPRepLayer.	1.0	0.0	inf		`false`
depth_mult	int	Number of RegVGGBlock used in CSPRepLayer.	1	1	inf		`false`
enc_act	string	Activation used for the encoder.	`gelu`				`false`
act	string	Activation used for top-down FPN and bottom-up PAN.	`silu`				`false`
dec_layers	int	Number of decoder layers in the transformer.	6	1			`true`
dn_number	int	Number of denoising queries.	100	0	inf		`false`
eval_idx	int	Index of decoder layer to use for evaluation. By default, use the last decoder layer.	-1	-1	inf		`false`
vfl_loss_coef	float	Relative weight of the varifocal error in the matching cost.	1.0	0.0	inf		`false`
bbox_loss_coef	float	Relative weight of the L1 error of the bounding box coordinates in the matching cost.	5.0	0.0	inf		`false`
giou_loss_coef	float	The relative weight of the GIoU loss of the bounding box in the matching cost.	2.0	0.0	inf		`false`
alpha	float	Alpha value in the varifocal loss.	0.75				`false`
gamma	float	Gamma value in the varifocal loss.	2.0				`false`
aux_loss	bool	A flag specifying whether to use auxiliary decoding losses (loss at each decoder layer).	`True`				`false`
loss_types	list	Losses to be used during training.	[`'vfl'`, `'boxes'`]				`false`
backbone_names	list	Prefix of the tensor names corresponding to the backbone.	[`'backbone.0'`]				`false`
linear_proj_names	list	Linear projection layer names.	[`'reference_points'`, `'sampling_offsets'`]				`false`
distillation_loss_coef	float	Coefficient for the distillation loss during distillation.	1.0				`false`
frozen_fm	collection	Configurable parameters used to construct the frozen foundation model.					`false`

frozen_fm#

The frozen_fm parameter provides options to change the Frozen RT-DETR (RT-DETR + a frozen foundation model) architecture.

Field	value_type	description	default_value	valid_options	automl_enabled
`enabled`	bool	Flag to set frozen foundation model as enabled or disabled. When set to True, the frozen foundation model will be enabled.	True		FALSE
`backbone`	string	Name of the frozen foundation model.	radio_v2-l	radio_v2-b,radio_v2-l,radio_v2-h	FALSE
`checkpoint`	string	The path to the pretrained frozen foundation model checkpoint.			FALSE

Note

The pretrained weights of the frozen foundation model can be found in the TAO Model Zoo.

train#

The train parameter defines the hyperparameters of the training process.

train:
  optim:
    lr: 0.0002
    lr_backbone: 0.00002
    lr_linear_proj_mult: 0.1
    momentum: 0.9
    weight_decay: 0.0001
    lr_scheduler: MultiStep
    lr_decay: 0.1
    lr_steps: [40]
    optimizer: AdamW
  num_epochs: 10
  checkpoint_interval: 5
  validation_interval: 5
  clip_grad_norm: 0.1
  precision: fp32
  distributed_strategy: ddp
  activation_checkpoint: True
  num_gpus: 1
  gpu_ids: [0]
  num_nodes: 1
  seed: 1234

Field	value_type	description	default_value	valid_min	valid_max	valid_options	automl_enabled
num_gpus	int	Number of GPUs to run the training job.	1	1			`false`
gpu_ids	list	List of GPU IDs to run the training on. The length of this list must equal the number of GPUs in `train.num_gpus`.	[0]				`false`
num_nodes	int	Number of nodes to run the training on. If >1, `multi-node` is enabled.	1				`false`
seed	int	Seed for the initializer in PyTorch. If <0, disable fixed seed.	1234	-1	inf		`false`
cudnn	collection						`false`
num_epochs	int	Number of epochs to run the training.	10	1	inf		`true`
checkpoint_interval	int	Interval (in epochs) at which a checkpoint is to be saved. Helps resume training.	1	1			`false`
validation_interval	int	Interval (in epochs) at which a evaluation is to be triggered on the validation dataset.	1	1			`false`
resume_training_checkpoint_path	string	Path to the checkpoint at which to resume training.					`false`
results_dir	string	Path to the location where all the assets generated from a task are stored.					`false`
freeze	list	List of layer names to freeze. Example: `["backbone", "encoder", "decoder"]`.	[]				`false`
pretrained_model_path	string	Path to a pretrained RT-DETR model to initialize the current training from.					`false`
clip_grad_norm	float	Amount to clip the gradient by L2 Norm. A value of 0.0 specifies no clipping.	0.1				`false`
is_dry_run	bool	Whether to run the trainer in Dry Run mode. This is a good way to validate the spec file and run a sanity check on the trainer without actually initializing and running the trainer.	`false`				`false`
enable_ema	bool	Whether to enable Exponential Moving Average during training.	`false`				`false`
ema	collection	Hyper parameters to configure the Exponential Moving Average.					`false`
optim	collection	Hyper parameters to configure the optimizer.					`false`
precision	string	Precision to run the training on.	fp32			bf16,fp32,fp16	`false`
distributed_strategy	string	The multi-GPU training strategy. DDP (Distributed Data Parallel) and Fully Sharded DDP are supported.	ddp			ddp,fsdp	`false`
activation_checkpoint	bool	Whether training is to recompute in backward pass to save GPU memory, rather than storing activations.	`true`				`false`
verbose	bool	Whether to enable printing of detailed learning rate scaling from the optimizer.	`false`				`false`

optim#

The optim parameter defines the config for the optimizer in training, including the learning rate, learning scheduler, and weight decay.

optim:
  lr: 0.0002
  lr_backbone: 0.00002
  lr_linear_proj_mult: 0.1
  momentum: 0.9
  weight_decay: 0.0001
  lr_scheduler: MultiStep
  lr_decay: 0.1
  lr_steps: [40]
  optimizer: AdamW

Field	value_type	description	default_value	valid_min	valid_max	valid_options	automl_enabled
`optimizer`	string	Type of optimizer used to train the network.	AdamW			AdamW,SGD	FALSE
`monitor_name`	string	The metric value to be monitored for the `AutoReduce` Scheduler	val_loss			val_loss,train_loss	FALSE
`lr`	float	The initial learning rate for training the model, excluding the backbone	0.0001				TRUE
`lr_backbone`	float	The initial learning rate for training the backbone	1e-05				TRUE
`momentum`	float	The momentum for the AdamW optimizer	0.9				TRUE
`weight_decay`	float	The weight decay coefficient	0.0001				TRUE
`lr_scheduler`	string	The learning scheduler: * MultiStep : Decrease the lr by lr_decay from lr_steps * StepLR : Decrease the lr by lr_decay at every lr_step_size.	MultiStep			MultiStep,StepLR	FALSE
`lr_steps`	list	The steps at which the learning rate must be decreased. This is applicable only with the MultiStep LR.	[1000]				FALSE
`lr_step_size`	int	The number of steps to decrease the learning rate in the StepLR	1000				TRUE
`lr_decay`	float	The decreasing factor for the learning rate scheduler	0.1				TRUE
`warmup_steps`	int	The number of steps to perform linear learning rate warm-up	0	0	inf		FALSE

dataset#

The dataset parameter defines the dataset source, training batch size, and augmentation.

dataset:
  train_data_sources:
    - image_dir: /path/to/coco/images/train2017/
      json_file: /path/to/coco/annotations/instances_train2017.json
  val_data_sources:
    image_dir: /path/to/coco/images/val2017/
    json_file: /path/to/coco/annotations/instances_val2017.json
  test_data_sources:
    image_dir: /path/to/coco/images/val2017/
    json_file: /path/to/coco/annotations/instances_val2017.json
  infer_data_sources:
    image_dir: /path/to/coco/images/val2017/
    classmap: /path/to/coco/annotations/coco_classmap.txt
  num_classes: 80
  batch_size: 4
  workers: 8

Field	value_type	description	default_value	valid_min	valid_max	valid_options	automl_enabled
train_data_sources	list	The list of data sources for training: image_dir: Directory that contains the training images json_file: Path of the JSON file, which uses training-annotation COCO format	[{‘image_dir’: ‘’, ‘json_file’: ‘’}]				`false`
val_data_sources	collection	The list of data sources for validation: image_dir: Directory that contains the validation images json_file: Path of the JSON file, which uses validation-annotation COCO format	{‘image_dir’: ‘’, ‘json_file’: ‘’}				`false`
test_data_sources	collection	The data source for testing: image_dir: Directory that contains the test images json_file: Path of the JSON file, which uses test-annotation COCO format	{‘image_dir’: ‘’, ‘json_file’: ‘’}				`false`
infer_data_sources	collection	The data source for inference: image_dir: List of directories that contains the inference images classmap: Path of the .txt file that contains class names	{‘image_dir’: [‘’], ‘classmap’: ‘’}				`false`
batch_size	int	Batch size for training and validation.	4	1	inf		`true`
workers	int	Number of parallel workers processing data.	8	1	inf		`true`
remap_mscoco_category	bool	Enables mapping of MSCOCO 91 classes to 80. Only required if you are directly training using the original COCO annotation files. For a custom dataset, set this value to `False`.	`False`				`false`
pin_memory	bool	Enables the dataloader to allocate pagelocked memory for faster data transfer between the CPU and GPU.	`True`				`false`
dataset_type	string	If set to default, follow the standard `CocoDetection` dataset structure from the torchvision which loads COCO annotation in every subprocess. This creates a redundant copy of data and can cause RAM to explode if `workers` is high. If set to `serialized`, the data is serialized through `pickle` and `torch.Tensor`, which allows the data to be shared acrosssubprocesses. As a result, RAM usage can be greatly improved.	serialized``			`serialized`, `default`	`false`
num_classes	int	The number of classes in the training data	80	1	inf		`false`
eval_class_ids	list	IDs of the classes for evaluation.	[1]				`false`
augmentation	collection	Configuration parameters for data augmentation					`false`

augmentation#

The augmentation parameter contains hyperparameters for augmentation.

augmentation:
  multi_scales:
  - 480
  - 512
  - 544
  - 576
  - 608
  - 640
  - 672
  - 704
  - 736
  - 768
  - 800
  train_spatial_size:
  - 640
  - 640
  eval_spatial_size:
  - 640
  - 640
  distortion_prob: 0.8
  iou_crop_prob: 0.8
  preserve_aspect_ratio: false

Field	value_type	description	default_value	valid_min	valid_max	automl_enabled
multi_scales	list	A list of sizes to perform random resize.	[480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800]			`false`
train_spatial_size	list	Input resolution to run evaluation during training. This is in the [h, w] order.	[640, 640]			`false`
eval_spatial_size	list	Input resolution to run evaluation during validation and testing. This is in the [h, w] order.	[640, 640]			`false`
distortion_prob	float	The probability for RandomPhotometricDistort	0.8	0.0	1.0	`true`
iou_crop_prob	float	The probability for RandomIoUCrop	0.8	0.0	1.0	`true`
preserve_aspect_ratio	bool	Flag to enable resize with preserving the aspect ratio.	`false`			`false`

Training the Model#

Use the following command to run RT-DETR training:

TAO Client (v2 API)

TRAIN_JOB_ID=$(tao rtdetr create-job \
  --kind experiment \
  --name "rtdetr_train" \
  --action train \
  --workspace-id $WORKSPACE_ID \
  --specs "$TRAIN_SPECS" \
  --train-datasets '["'$DATASET_ID'"]' \
  --eval-dataset "$DATASET_ID" \
  --base-experiment-ids '["'$BASE_EXPERIMENT_ID'"]' \
  --encryption-key "nvidia_tlt" | jq -r '.id')

.. include:: /text/excerpts/multi_node_training_ftms.rst

TAO Launcher

tao model rtdetr train [-h] -e <experiment_spec_file>
                          [results_dir=<global_results_dir>]
                          [model.<model_option>=<model_option_value>]
                          [dataset.<dataset_option>=<dataset_option_value>]
                          [train.<train_option>=<train_option_value>]
                          [train.gpu_ids=<gpu indices>]
                          [train.num_gpus=<number of gpus>]

Required Arguments

-e, --experiment_spec_file: The path to the experiment spec file.

Optional Arguments

-h, --help: Show this help message and exit.
model.<model_option>: The model options.
dataset.<dataset_option>: The dataset options.
train.<train_option>: The train options.
train.optim.<optim_option>: The optimizer options

Note

For training, evaluation, and inference, we expose two variables for each task: num_gpus and gpu_ids, which default to 1 and [0], respectively. If both are passed, but are inconsistent, for example num_gpus = 1, gpu_ids = [0, 1], then they are modified to follow the setting that implies more GPUs; in the same example num_gpus is modified from 1 to 2.

In some cases multi-GPU training may result in a segmentation fault. You can circumvent this by setting the enviroment variable OMP_NUM_THREADS to 1. Depending upon your model of execution, you may use the following methods to set this variable:

CLI Launcher:

You may set the environment variable by adding the following fields to the Envs field of your ~/.tao_mounts.json file as mentioned in bullet 3 in ths section Running the launcher.
```
{
    "Envs": [
        {
            "variable": "OMP_NUM_THREADSR",
            "value": "1"
        }

}
```

Docker:

You may set environment variables in Docker by setting the -e flag in the Docker command line.

docker run -it --rm --gpus all \
    -e OMP_NUM_THREADS=1 \
    -v /path/to/local/mount:/path/to/docker/mount nvcr.io/nvidia/tao/tao-toolkit:5.5.0-pyt <model> train -e

Checkpointing and Resuming Training

At every train.checkpoint_interval, a PyTorch Lightning checkpoint is saved. It is called model_epoch_<epoch_num>.pth. Checkpoints are saved in train.results_dir, like this:

$ ls /results/train

'model_epoch_000.pth'
'model_epoch_001.pth'
'model_epoch_002.pth'
'model_epoch_003.pth'
'model_epoch_004.pth'

Note

You may resume a previously aborted training job by setting the train.resume_training_checkpoint_path to the path of the intermediate checkpoint file. The checkpoint files must follow the model_epoch_*.pth or model_epoch*-EMA.pth format. You must use the *-EMA.pth file if your training spec has EMA enabled.

You may set this parameter by providing the corresponding flag over command line.

TAO Client (v2 API)

TRAIN_JOB_ID=$(tao-client rtdetr job-resume --job $TRAIN_JOB_ID --action train --id $EXPERIMENT_ID --specs "$SPECS")

TAO Launcher

tao model rtdetr train -e <experiment_spec_file> train.resume_training_checkpoint_path=<model_epoch_001.pth>

The major implication of this logic is that, if you wish to trigger fresh training from scratch, either

Specify a new, empty results directory (Recommended), or
Remove the latest checkpoint from the results directory

Evaluating the Model#

evaluate#

The evaluate parameter defines the hyperparameters of the evaluate process.

evaluate:
  checkpoint: /path/to/model.pth
  conf_threshold: 0.0

Field	value_type	description	default_value	valid_min	automl_enabled
checkpoint	string		???		`false`
results_dir	string				`false`
nput_width	int	Width of the input image tensor.		1	`false`
input_height	int	Height of the input image tensor.		1	`false`
trt_engine	string	Path to the TensorRT engine to be used for evaluation. This only works with `tao-deploy`.			`false`
conf_threshold	float	The value of the confidence threshold to be used when filtering out the final list of boxes.	0.0		`false`

To run evaluation with a RT-DETR model, use this command:

TAO Client (v2 API)

EVAL_JOB_ID=$(tao rtdetr create-job \
  --kind experiment \
  --name "rtdetr_evaluate" \
  --action evaluate \
  --workspace-id $WORKSPACE_ID \
  --parent-job-id $TRAIN_JOB_ID \
  --eval-dataset "$DATASET_ID" \
  --specs "$EVALUATE_SPECS" \
  --base-experiment-ids '["'$BASE_EXPERIMENT_ID'"]' \
  --encryption-key "nvidia_tlt" | jq -r '.id')

TAO Launcher

tao model rtdetr evaluate [-h] -e <experiment_spec>
                          evaluate.checkpoint=<model to be evaluated>
                          [evaluate.<evaluate_option>=<evaluate_option_value>]
                          [evaluate.gpu_ids=<gpu indices>]
                          [evaluate.num_gpus=<number of gpus>]

Required Arguments

The following arguments are required.

-e, --experiment_spec: The experiment spec file to set up the evaluation experiment
evaluate.checkpoint: The .pth model to be evaluated.

Optional Arguments

The following arguments are optional to run the command.

evaluate.<evaluate_option>: The evaluate options.

Running Inference with an RT-DETR Model#

inference#

The inference parameter defines the hyperparameters of the inference process.

inference:
  checkpoint: /path/to/model.pth
  conf_threshold: 0.5
  color_map:
    person: red
    car: blue

Field	value_type	description	default_value	valid_min	automl_enabled
checkpoint	string		???		`false`
results_dir	string				`false`
trt_engine	string	Path to the TensorRT engine to be used for evaluation. This only works with `tao-deploy`.			`false`
color_map	collection	Class-wise dictionary with colors to render boxes.			`false`
conf_threshold	float	The value of the confidence threshold to be used when filtering out the final list of boxes.	0.5		`false`
is_internal	bool	Flag to render with internal directory structure.	`false`		`false`
input_width	int	Width of the input image tensor.	960	32	`false`
input_height	int	Height of the input image tensor.	544	32	`false`
outline_width	int	Width in pixels of the bounding box outline.	3	1	`false`

The inference tool for RT-DETR models can be used to visualize bboxes and generate frame-by- frame KITTI format labels on a directory of images.

TAO Client (v2 API)

INFER_JOB_ID=$(tao rtdetr create-job \
  --kind experiment \
  --name "rtdetr_inference" \
  --action inference \
  --workspace-id $WORKSPACE_ID \
  --parent-job-id $TRAIN_JOB_ID \
  --inference-dataset "$DATASET_ID" \
  --specs "$INFERENCE_SPECS" \
  --base-experiment-ids '["'$BASE_EXPERIMENT_ID'"]' \
  --encryption-key "nvidia_tlt" | jq -r '.id')

TAO Launcher

tao model rtdetr inference [-h] -e <experiment spec file>
                          inference.checkpoint=<model to be inferenced>
                          [inference.<inference_option>=<inference_option_value>]
                          [inference.gpu_ids=<gpu indices>]
                          [inference.num_gpus=<number of gpus>]

Required Arguments

The following arguments are required to run the command.

-e, --experiment_spec: The experiment spec file to set up the inference experiment
inference.checkpoint: The .pth model to inference.

Optional Arguments

The following arguments are optional to run the command.

inference.<inference_option>: The inference options.

Distilling the Model#

distill#

The distill parameter defines the hyperparameters for the distillation process.

distill:
  teacher:
    backbone: convnext_large
    train_backbone: False
    num_queries: 300
    num_select: 300
    num_feature_levels: 3
    return_interm_indices:
    - 1
    - 2
    - 3
    feat_strides:
    - 8
    - 16
    - 32
    hidden_dim: 256
    nheads: 8
    dropout_ratio: 0.0
    enc_layers: 1
    dim_feedforward: 1024
    use_encoder_idx:
    - 2
    pe_temperature: 10000
    expansion: 1.0
    depth_mult: 1
    enc_act: gelu
    act: silu
    dec_layers: 6
    dn_number: 100
    feat_channels:
    - 256
    - 256
    - 256
    eval_idx: -1
    vfl_loss_coef: 1.0
    bbox_loss_coef: 5.0
    giou_loss_coef: 2.0
    alpha: 0.75
    gamma: 2.0
    clip_max_norm: 0.1
    aux_loss: true
    loss_types:
    - vfl
    - boxes
    backbone_names:
    - backbone.0
    linear_proj_names:
    - reference_points
    - sampling_offsets
  pretrained_teacher_model_path: /path/to/teacher/model_epoch_070.pth
  bindings:
  - teacher_module_name: 'srcs'
    student_module_name: 'srcs'
    criterion: IOU
    weight: 20

Field

value_type

description

default_value

valid_min

valid_max

valid_options

automl_enabled

teacher

collection

Configurable parameters to construct the teacher model. (Same as the model config)

false

pretrained_teacher_model_path

string

Path to the pre-trained teacher model.

false

bindings

list dict

The list of bindings between teacher and student to use for calculating distill loss:

teacher_module_name: The name of the teacher module
student_module_name: The name of the student module
criterion`: The criterion to use for calculating binding loss (L1, L2, KL, IOU)
weight`: The value of the weight to use for the binding; default is 1.0

false

Note

We recommend using “IOU” as the criterion and teacher_module_name/student_module_name as “srcs” for distillation. total_loss = distillation_loss_coef * distillation_loss + other RTDETR losses, where distillation_loss = sum(binding_loss)

TAO Client (v2 API)

DISTILL_JOB_ID=$(tao rtdetr create-job \
  --kind experiment \
  --name "rtdetr_distill" \
  --action distill \
  --workspace-id $WORKSPACE_ID \

  --encryption-key "nvidia_tlt" | jq -r '.id')

TAO Launcher

tao model rtdetr distill [-h] -e <experiment spec file>

Required Arguments

The following arguments are required to run the command.

-e, --experiment_spec: The path to an experiment spec file

Exporting the Model#

export#

The export parameter defines the hyperparameters for the export process.

export:
  checkpoint: /path/to/model.pth
  onnx_file: /path/to/model.onnx
  on_cpu: False
  opset_version: 12
  input_channel: 3
  input_width: 640
  input_height: 640
  batch_size: -1

Field	value_type	description	default_value	valid_min	automl_enabled
checkpoint	string	Path to the checkpoint file to run export.	???		`false`
onnx_file	string	Path to the onnx model file.	???		`false`
on_cpu	bool	Flag to export CPU compatible model.	`false`		`false`
input_channel	int	Number of channels in the input Tensor.	3	3	`false`
input_width	int	Width of the input image tensor.	960	32	`false`
input_height	int	Height of the input image tensor.	544	32	`false`
opset_version	int	Operator set version of the ONNX model used to generate the TensorRT engine.	17	1	`false`
batch_size	int	The batch size of the input Tensor for the engine. A value of `-1` implies dynamic tensor shapes.	-1	-1	`false`
verbose	bool	Flag to enable verbose TensorRT logging.	`false`		`false`

Note

When you export a RT-DETR model with frozen_fm enabled, the .onnx file has a static batch size of 1.

TAO Client (v2 API)

EXPORT_JOB_ID=$(tao rtdetr create-job \
  --kind experiment \
  --name "rtdetr_export" \
  --action export \
  --workspace-id $WORKSPACE_ID \
  --parent-job-id $TRAIN_JOB_ID \
  --specs "$EXPORT_SPECS" \
  --base-experiment-ids '["'$BASE_EXPERIMENT_ID'"]' \
  --encryption-key "nvidia_tlt" | jq -r '.id')

TAO Launcher

tao model rtdetr export [-h] -e <experiment spec file>
                          export.checkpoint=<model to export>
                          export.onnx_file=<onnx path>
                          [export.<export_option>=<export_option_value>]

Required Arguments

The following arguments are required to run the command.

-e, --experiment_spec: The path to an experiment spec file
export.checkpoint: The .pth model to export.
export.onnx_file: The path where the .etlt or .onnx model is saved.

Optional Arguments

The following arguments are optional to run the command.

export.<export_option>: The export options.

Quantization#

RT-DETR supports PTQ via TAO Quant using either the torchao (weight-only) or modelopt (static PTQ) backends.

Add a quantize section to your experiment specification (see TAO Quant documentation for schema and backend options).

Run:

tao model rtdetr quantize -e <experiment_spec_file>

Use the quantized checkpoint by setting evaluate.is_quantized: true or inference.is_quantized: true and pointing to the artifact saved under results_dir (for example, quantized_model_torchao.pth or quantized_model_modelopt.pth). For ModelOpt artifacts, the model weights are stored under model_state_dict.

Notes#

For modelopt static PTQ, ensure your evaluate configuration provides a representative validation loader; it is reused for calibration.
For torchao, activation settings in the config are ignored.

Calibration dataset (ModelOpt)#

When using the modelopt backend (static PTQ), provide a calibration dataset via dataset.quant_calibration_data_sources. This mirrors your validation configuration but may point to a smaller, representative subset.

Minimal example:

quantize:
  backend: "modelopt"
  mode: "static_ptq"
  algorithm: "minmax"
dataset:
  quant_calibration_data_sources:
    image_dir: "/path/to/calib/images"
    # json_file: "/path/to/calib/annotations.json"   # optional; COCO format if used

See also: TAO Quant overview and its Configuration and backend pages.

TensorRT engine generation, validation, and int8 calibration#

For deployment, please refer to TAO Deploy documentation.

Deploying to DeepStream#

Refer to the Integrating a RT-DETR Model page for more information about deploying a RT-DETR model to DeepStream.