DINO#

DINO is an object-detection model included in the TAO. It supports the following tasks:

  • convert

  • train

  • evaluate

  • inference

  • export

  • distill

These tasks can be invoked from the TAO Launcher using the following convention on the command-line:

tao model dino <sub_task> <args_per_subtask>

where, args_per_subtask are the command-line arguments required for a given subtask. Each subtask is explained in detail in the following sections.

Data Input for DINO#

DINO expects directories of images for training or validation and annotated JSON files in COCO format.

Note

The category_id from your COCO JSON file should start from 1 because 0 is set as a background class. In addition, dataset.num_classes should be set to max class_id + 1. For instance, even though there are only 80 classes used in COCO, the largest class_id is 90, so dataset.num_classes should be set to 91.

Sharding the Data (Optional)#

Note

Sharding is not necessary if the annotation is already in JSON format and your dataset is smaller than the COCO dataset. This subtask also assumes that your dataset is in KITTI format.

For a large dataset, you can optionally use convert to shard the dataset into smaller chunks to reduce the memory burden. In this process, KITTI-based annotations are converted into smaller sharded JSON files, similar to other object detection networks. Here is an example spec file for converting KITTI-based folders into multiple sharded JSON files.

input_source: /workspace/tao-experiments/data/sequence.txt
results_dir: /workspace/tao-experiments/sharded
image_dir_name: images
label_dir_name: labels
num_shards: 32
num_partitions: 1
mapping_path: /path/to/your_category_mapping

The details of each parameter are summarized in the table below:

Parameter

Data Type

Default

Description

Supported Values

input_source

string

None

The .txt file listing data sources

results_dir

string

None

The output directory where sharded JSON files will be stored

image_dir_name

string

None

The relative path to the directory containing images from the path listed in the input_source .txt file

label_dir_name

string

None

The relative path to the directory containing JSON data from the path listed in the input_source .txt file

num_shards

unsigned int

32

The number of shards per partition

>0

num_partitions

unsigned int

1

The number of partitions in the data

>0

mapping_path

string

None

Path to a JSON file containing the class mapping

The category mapping should contain mapping of your dataset and be in reverse alphabetical order. The default mapping is shown below:

DEFAULT_TARGET_CLASS_MAPPING = {
  "Person": "person",
  "Person Group": "person",
  "Rider": "person",
  "backpack": "bag",
  "face": "face",
  "large_bag": "bag",
  "person": "person",
  "person group": "person",
  "person_group": "person",
  "personal_bag": "bag",
  "rider": "person",
  "rolling_bag": "bag",
  "rollingbag": "bag",
  "largebag": "bag",
  "personalbag": "bag"
}

The following example shows how to use the convert command:

tao model dino convert -e /path/to/spec.yaml

Creating an Experiment Spec File#

The training experiment spec file for DINO includes model, train, and dataset parameters. Here is an example spec file for training a DINO model with a resnet_50 backbone on a COCO dataset.

dataset:
  train_data_sources:
    - image_dir: /path/to/coco/train2017/
      json_file: /path/to/coco/annotations/instances_train2017.json
  val_data_sources:
    - image_dir: /path/to/coco/val2017/
      json_file: /path/to/coco/annotations/instances_val2017.json
  num_classes: 91
  batch_size: 4
  workers: 8
  augmentation:
    scales: [480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800]
    input_mean: [0.485, 0.456, 0.406]
    input_std: [0.229, 0.224, 0.225]
    horizontal_flip_prob: 0.5
    train_random_resize: [400, 500, 600]
    train_random_crop_min: 384
    train_random_crop_max: 600
    random_resize_max_size: 1333
    test_random_resize: 800
model:
  pretrained_model_path: /path/to/your-fan-small-pretrained-model
  backbone: fan_small
  train_backbone: True
  num_feature_levels: 4
  dec_layers: 6
  enc_layers: 6
  num_queries: 300
  num_queries: 900
  dropout_ratio: 0.0
  dim_feedforward: 2048
train:
  optim:
    lr: 0.0001
    lr_backbone: 0.00001
    lr_linear_proj_mult: 0.1
    momentum: 0.9
    weight_decay: 0.0001
    lr_scheduler: MultiStep
    lr_decay: 0.1
    lr_steps: [11]
    optimizer: AdamW
  num_epochs: 10
  checkpoint_interval: 5
  validation_interval: 5
  clip_grad_norm: 0.1
  precision: fp32
  distributed_strategy: ddp
  activation_checkpoint: True
  num_gpus: 1
  gpu_ids: [0]
  num_nodes: 1
  seed: 1234

Parameter

Data Type

Default

Description

Supported Values

model

dict config

The configuration of the model architecture

dataset

dict config

The configuration of the dataset

train

dict config

The configuration of the training task

evaluate

dict config

The configuration of the evaluation task

inference

dict config

The configuration of the inference task

encryption_key

string

None

The encryption key to encrypt and decrypt model files

results_dir

string

/results

The directory where experiment results are saved

export

dict config

The configuration of the ONNX export task

gen_trt_engine

dict config

The configuration of the TensorRT generation task. Only used in TAO deploy

model#

The model parameter provides options to change the DINO architecture.

model:
  pretrained_model_path: /path/to/your-fan-small-pretrained-model
  backbone: fan_small
  train_backbone: True
  num_feature_levels: 4
  dec_layers: 6
  enc_layers: 6
  num_queries: 300
  num_queries: 900
  dropout_ratio: 0.0
  dim_feedforward: 2048

Parameter

Datatype

Default

Description

Supported Values

pretrained_backbone_path

string

None

The optional path to the pretrained backbone file

string to the path

backbone





string





resnet_50





The backbone name of the model. GCViT, FAN, ResNet 50, and NVDINOv2 are supported.





resnet_50, gc_vit_xxtiny,
gc_vit_xtiny, gc_vit_tiny,
gc_vit_small, gc_vit_base,
gc_vit_large, fan_tiny,
fan_small, fan_base,
fan_large, vit_large_nvdinov2

train_backbone

bool

True

A flag specifying whether to train the backbone or not

True, False

num_feature_levels

unsigned int

4

The number of feature levels to use in the model

1,2,3,4,5

return_interm_indices

int list

[1, 2, 3, 4]

The index of feature levels to use in the model. The length must match num_feature_levels.

[0, 1, 2, 3, 4], [1, 2, 3, 4],
[1, 2, 3], [1, 2], [1]

dec_layers

unsigned int

6

The number of decoder layers in the transformer

>0

enc_layers

unsigned int

6

The number of encoder layers in the transformer

>0

num_queries

unsigned int

900

The number of queries

>0

dim_feedforward

unsigned int

2048

The dimension of the feedforward network

>0

num_select

unsigned int

300

The number of top-K predictions selected during post-process

>0

use_dn

bool

True

A flag specifying whether to enbable contrastive de-noising training in DINO

True, False

dn_number

unsigned_int

100

The number of de-noising queries in DINO

>0

dn_box_noise_scale

float

1.0

The scale of noise applied to boxes during contrastive de-noising. If this value is 0, noise is not applied.

>=0

dn_label_noise_ratio

float

0.5

The scale of noise applied to labels during contrastive de-noising. If this value is 0, noise is not applied.

>=0

pe_temperatureH

unsigned_int

20

The temperature applied to the height dimension of Positional Sine Embedding

>0

pe_temperatureW

unsigned_int

20

The temperature applied to the width dimension of Positional Sine Embedding

>0

fix_refpoints_hw

signed_int

-1

If this value is -1, width and height are learned seperately for each box. If this value is -2,
a shared width and height are learned. A value greater than 0 specifies learning with a fixed number.
>0, -1, -2

dropout_ratio

float

0.0

The probability to drop hidden units

0.0 ~ 1.0

cls_loss_coef

float

2.0

The relative weight of the classification error in the matching cost

>0.0

bbox_loss_coef

float

5.0

The relative weight of the L1 error of the bounding box coordinates in the matching cost

>0.0

giou_loss_coef

float

2.0

The relative weight of the GIoU loss of the bounding box in the matching cost

>0.0

focal_alpha

float

0.25

The alpha in the focal loss

>0.0

aux_loss

bool

True

A flag specifying whether to use auxiliary decoding losses (loss at each decoder layer)

True, False

train#

The train parameter defines the hyperparameters of the training process.

train:
  optim:
    lr: 0.0001
    lr_backbone: 0.00001
    lr_linear_proj_mult: 0.1
    momentum: 0.9
    weight_decay: 0.0001
    lr_scheduler: MultiStep
    lr_decay: 0.1
    lr_steps: [11]
    optimizer: AdamW
  num_epochs: 10
  checkpoint_interval: 5
  validation_interval: 5
  clip_grad_norm: 0.1
  precision: fp32
  distributed_strategy: ddp
  activation_checkpoint: True
  num_gpus: 1
  gpu_ids: [0]
  num_nodes: 1
  seed: 1234

Parameter

Datatype

Default

Description

Supported Values

num_gpus

unsigned int

1

The number of GPUs to use for distributed training

>0

gpu_ids

List[int]

[0]

The indices of the GPU’s to use for distributed training

seed

unsigned int

1234

The random seed for random, NumPy, and torch

>0

num_epochs

unsigned int

10

The total number of epochs to run the experiment

>0

checkpoint_interval

unsigned int

1

The epoch interval at which the checkpoints are saved

>0

validation_interval

unsigned int

1

The epoch interval at which the validation is run

>0

resume_training_checkpoint_path

string

The intermediate PyTorch Lightning checkpoint to resume training from

results_dir

string

/results/train

The directory to save training results

optim

dict config

The config for the optimizer, including the learning rate, learning scheduler, and weight decay

>0

clip_grad_norm

float

0.1

amount to clip the gradient by the L2 norm. A value of 0.0 specifies no clipping

>=0

precision

string

fp32

Specifying “fp16” enables precision training. Training with fp16 can help save GPU memory.

fp32, fp16

distributed_strategy

string

ddp

The multi-GPU training strategy. DDP (Distributed Data Parallel) and FSDP
(Fully Sharded Data Parallel) are supported.
ddp, fsdp

activation_checkpoint

bool

True

A True value instructs train to recompute in backward pass to save GPU memory, rather than storing activations.

True, False

pretrained_model_path

string

Path to pretrained model checkpoint path to load for finetuning

num_nodes

unsigned int

1

The number of nodes. If the value is larger than 1, multi-node is enabled

>0

freeze

string list

[]

The list of layer names in the model to freeze. Example [“backbone”, “transformer.encoder”, “input_proj”]

verbose

bool

False

Whether to print detailed learning rate scaling from the optimizer

True, False

distill#

The distill parameter defines the hyperparameters for distillation.

distill:
  teacher:
    backbone: fan_small
    train_backbone: False
    num_feature_levels: 4
    dec_layers: 6
    enc_layers: 6
    num_queries: 900
    dropout_ratio: 0.0
    dim_feedforward: 2048
  pretrained_teacher_model_path: /path/to/your-fan-small-pretrained-teacher-model
  bindings:
  - teacher_module_name: teacher_module_name
    student_module_name: student_module_name
    criterion: L2
    weight: 1.0

Parameter

Datatype

Default

Description

Supported Values

teacher

dict config

The config for the teacher model

>0

pretrained_teacher_model_path

string

Path to pretrained teacher model checkpoint path to load for distillation

>0

bindings

list dict

The list of bindings between teacher and student to use for calculating distill loss
* teacher_module_name : The name of the teacher module
* student_module_name : The name of the student module
* criterion : The name of the criterion to use for calculating binding loss (L1, L2, KL)
* weight : The value of the weight to use for the binding default is 1.0


>0.0


optim#

The optim parameter defines the config for the optimizer in training, including the learning rate, learning scheduler, and weight decay.

optim:
  lr: 0.0001
  lr_backbone: 0.00001
  lr_linear_proj_mult: 0.1
  momentum: 0.9
  weight_decay: 0.0001
  lr_scheduler: MultiStep
  lr_decay: 0.1
  lr_steps: [11]
  optimizer: AdamW

Parameter

Datatype

Default

Description

Supported Values

lr

float

1e-4

The initial learning rate for training the model, excluding the backbone

>0.0

lr_backbone

float

1e-5

The initial learning rate for training the backbone

>0.0

lr_linear_proj_mult

float

0.1

The initial learning rate for training the linear projection layer

>0.0

momentum

float

0.9

The momentum for the AdamW optimizer

>0.0

weight_decay

float

1e-4

The weight decay coefficient

>0.0

lr_scheduler


string


MultiStep


The learning scheduler:
* MultiStep : Decrease the lr by lr_decay from lr_steps
* StepLR : Decrease the lr by lr_decay at every lr_step_size
MultiStep/StepLR


lr_decay

float

0.1

The decreasing factor for the learning rate scheduler

>0.0

lr_decay_rate

float

0.65

The layer-wise learning decay rate used for ViT only

>0.0

lr_steps

int list

[11]

The steps to decrease the learning rate for the MultiStep scheduler

int list

lr_step_size

unsigned int

11

The steps to decrease the learning rate for the StepLR scheduler

>0

lr_monitor

string

val_loss

The monitor value for the AutoReduce scheduler

val_loss/train_loss

optimizer

string

AdamW

The optimizer to use during training

AdamW/SGD

dataset#

The dataset parameter defines the dataset source, training batch size, and augmentation.

dataset:
  train_data_sources:
    - image_dir: /path/to/coco/images/train2017/
      json_file: /path/to/coco/annotations/instances_train2017.json
  val_data_sources:
    - image_dir: /path/to/coco/images/val2017/
      json_file: /path/to/coco/annotations/instances_val2017.json
  test_data_sources:
    image_dir: /path/to/coco/images/val2017/
    json_file: /path/to/coco/annotations/instances_val2017.json
  infer_data_sources:
    image_dir: /path/to/coco/images/val2017/
    classmap: /path/to/coco/annotations/coco_classmap.txt
  num_classes: 91
  batch_size: 4
  workers: 8

Parameter

Datatype

Default

Description

Supported Values

train_data_sources


list dict





The training data sources:
* image_dir : The directory that contains the training images
* json_file : The path of the JSON file, which uses training-annotation COCO format



val_data_sources


list dict





The validation data sources:
* image_dir : The directory that contains the validation images
* json_file : The path of the JSON file, which uses validation-annotation COCO format



test_data_sources


dict





The test data sources for evaluation:
* image_dir : The directory that contains the test images
* json_file : The path of the JSON file, which uses test-annotation COCO format



infer_data_sources


dict





The infer data sources for inference:
* image_dir : The directory that contains the inference images
* classmap : The path of the .txt file that contains class names



augmentation

dict config

The parameters to define the augmentation method

num_classes

unsigned int

91

The number of classes in the training data

>0

batch_size

unsigned int

4

The batch size for training and validation

>0

workers

unsigned int

8

The number of parallel workers processing data

>0

train_sampler

string

default_sampler

The minibatch sampling method. Non-default sampling methods can be enabled for multi-node
jobs. This config doesn’t have any effect if dataset_type isn’t set to default
default_sampler, non_uniform_sampler,
uniform_sampler
dataset_type




string




serialized




If set to default, we follow the standard CocoDetection` dataset structure
from the torchvision, which loads COCO annotation in every subprocess. This leads to redudant
copy of data and can cause RAM to explod if workers` is high. If set to serialized,
the data is serialized through pickle and torch.Tensor` that allows the data to be shared
across subprocess. As a result, RAM usage can be greatly improved.
serialized, default




augmentation#

The augmentation parameter contains hyperparameters for augmentation.

augmentation:
  scales: [480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800]
  input_mean: [0.485, 0.456, 0.406]
  input_std: [0.229, 0.224, 0.225]
  horizontal_flip_prob: 0.5
  train_random_resize: [400, 500, 600]
  train_random_crop_min: 384
  train_random_crop_max: 600
  random_resize_max_size: 1333
  test_random_resize: 800

Parameter

Datatype

Default

Description

Supported Values

scales


int list


[480, 512, 544, 576,
608, 640, 672, 704,
736, 768, 800]
A list of sizes to perform random resize.





input_mean

float list

[0.485, 0.456, 0.406]

The input mean for RGB frames: (input - mean) / std

float list / size=1 or 3

input_std

float list

[0.229, 0.224, 0.225]

The input standard deviation for RGB frames: (input - mean) / std

float list / size=1 or 3

horizontal_flip_prob

float

0.5

The probability for horizonal flip during training

>=0

train_random_resize

int list

[400, 500, 600]

A list of sizes to perform random resize for training data

int list

train_random_crop_min

unsigned int

384

The minimum random crop size for training data

>0

train_random_crop_max

unsigned int

600

The maximum random crop size for training data

>0

random_resize_max_size

unsigned int

1333

The maximum random resize size for training data

>0

test_random_resize

unsigned int

800

The random resize size for test data

>0

fixed_padding


bool


True


A flag specifying whether to resize the image (with no padding) to
(sorted(scales[-1]), random_resize_max_size) to prevent a CPU
memory leak.
True/False


fixed_random_crop

unsigned int



A flag to enable Large Scale Jittering, which is used for ViT backbones.
The resulting image resolution is fixed to fixed_random_crop.
Divisible by 32

Example Spec File for ViT Backbones#

Note

The following spec file is only relevant for TAO versions 5.2 and later. Vision Transformer (ViT) requires a different augmentation and learning rate decay to work as backbone to a detector.

dataset:
  train_data_sources:
    - image_dir: /path/to/coco/train2017/
      json_file: /path/to/coco/annotations/instances_train2017.json
  val_data_sources:
    - image_dir: /path/to/coco/val2017/
      json_file: /path/to/coco/annotations/instances_val2017.json
  num_classes: 91
  batch_size: 4
  workers: 8
  augmentation:
    input_mean: [0.485, 0.456, 0.406]
    input_std: [0.229, 0.224, 0.225]
    horizontal_flip_prob: 0.5
    fixed_random_crop: 1024
    random_resize_max_size: 1024
    test_random_resize: 1024
    fixed_padding: True
model:
  pretrained_model_path: /path/to/nvdinov2_patch16_model
  backbone: vit_large_nvdinov2
  train_backbone: False
  num_feature_levels: 4
  dec_layers: 6
  enc_layers: 6
  num_queries: 900
  dropout_ratio: 0.0
  dim_feedforward: 2048
train:
  optim:
    lr_backbone: 2e-5
    lr: 2e-4
    lr_steps: [11]
    layer_decay_rate: 0.65
  num_epochs: 12

Training the Model#

Use the following command to run DINO training:

tao model dino train [-h] -e <experiment_spec>
               [results_dir=<global_results_dir>]
               [model.<model_option>=<model_option_value>]
               [dataset.<dataset_option>=<dataset_option_value>]
               [train.<train_option>=<train_option_value>]
               [train.gpu_ids=<gpu indices>]
               [train.num_gpus=<number of gpus>]

Required Arguments#

The only required argument is the path to the experiment spec:

  • -e, --experiment_spec: The experiment specification file to set up the training experiment

Optional Arguments#

You can set optional arguments to override the option values in the experiment spec file.

Note

For training, evaluation, and inference, we expose 2 variables for each respective task: num_gpus and gpu_ids, which default to 1 and [0], respectively. If both are passed, but inconsistent, for example num_gpus = 1, gpu_ids = [0, 1], then they are modified to follow the setting with more GPUs, for example num_gpus = 1 -> num_gpus = 2.

Checkpointing and Resuming Training#

At every train.checkpoint_interval, a PyTorch Lightning checkpoint is saved. It is called model_epoch_<epoch_num>.pth. These are saved in train.results_dir, like so:

$ ls /results/train

'model_epoch_000.pth'
'model_epoch_001.pth'
'model_epoch_002.pth'
'model_epoch_003.pth'
'model_epoch_004.pth'

The latest checkpoint is also saved as dino_model_latest.pth. Training automatically resumes from dino_model_latest.pth, if it exists in train.results_dir. This is superseded by train.resume_training_checkpoint_path, if it is provided.

The major implication of this logic is that, if you wish to trigger fresh training from scratch, either:

  • Specify a new, empty results directory (Recommended)

  • Remove the latest checkpoint from the results directory

Optimizing Resource for Training DINO#

Training DINO requires strong GPUs (for example, V100/A100) with at least 15GB of VRAM and a lot of CPU memory to be trained on a standard dataset like COCO. This section outlines some of the strategies you can use to launch training with only limited resources.

Optimize GPU Memory#

There are various ways to optimize GPU memory usage. One trick is to reduce dataset.batch_size. However, this can cause your training to take longer than usual. We recommend setting the following configurations to optimize GPU consumption:

  • Set train.precision to fp16 to enable automatic mixed precision training. This can reduce your GPU memory usage by 50%.

  • Set train.activation_checkpoint to True to enable activation checkpointing. By recomputing the activations instead of caching them into memory, the memory usage can be improved.

  • Set train.distributed_strategy to fsdp to enable Fully Sharded Data Parallel training. This shares gradient calculation across different processes to help reduce GPU memory.

  • Try using more lightweight backbones like fan_tiny or freeze the backbone through setting model.train_backbone to False.

  • Try changing the augmentation resolution in dataset.augmentation depending on your dataset.

Optimize CPU Memory#

To speed up data loading, it is a common practice to set a high number of workers to spawn multiple processes. However, this can cause your CPU memory to become Out of Memory, if the size of your annotation file is very large. We recommend setting the following configurations to optimize CPU consumption:

  • Set dataset.dataset_type to serialized so that the COCO-based annotation data can be shared across different subprocesses.

  • Set dataset.augmentation.fixed_padding to True so that images are padded before the batch formulation. Due to random resize and random crop augmentation during training, the resulting image resolution after transform can vary across images. Such variable image resolutions can cause memory leak and the CPU memory to slowly stacks up until it becomes Out of Memory in the middle of training. This is the limitation of PyTorch so we advise setting fixed_padding to True to help stablize the CPU memory usage.

Distilling the Model#

To distill a DINO model, use this command:

tao model dino distill [-h] -e <experiment_spec>
                       [-r <results_dir>]
                       [-k <key>]

Required Arguments#

  • -e, --experiment_spec: The experiment spec file to set up the distillation experiment

Optional Arguments#

  • -r, --results_dir: The path to the folder where the experiment outputs are written. If this argument is not specified, the results_dir from the spec file is used.

  • -k, --key: A user-specific encoding key to save or load a .tlt model. If this argument is not specified, the model checkpoint is not encrypted.

  • --gpus: The number of GPUs used to run training.

  • --num_nodes: The number of nodes used to run training. If this value is larger than 1, distributed multi-node training is enabled.

  • -h, --help: Show this help message and exit.

Sample Usage#

The following is an example of the distill command:

tao dino model distill -e /path/to/spec.yaml

Evaluating the Model#

evaluate#

The evaluate parameter defines the hyperparameters of the evaluate process.

evaluate:
  checkpoint: /path/to/model.pth
  conf_threshold: 0.0

Parameter

Datatype

Default

Description

Supported Values

checkpoint

string

Path to PyTorch model to evaluate

results_dir

string

/results/evaluate

The directory to save evaluation results

num_gpus

unsigned int

1

The number of GPUs to use for distributed evaluation

>0

gpu_ids

List[int]

[0]

The indices of the GPU’s to use for distributed evaluation

trt_engine

string

Path to TensorRT model to evaluate. Only used with TAO deploy

conf_threshold

float

0.0

Confidence threshold to filter predictions

>=0

To run evaluation with a DINO model, use this command:

tao model dino evaluate [-h] -e <experiment_spec>
               evaluate.checkpoint=<model to be evaluated>
               [evaluate.<evaluate_option>=<evaluate_option_value>]
               [evaluate.gpu_ids=<gpu indices>]
               [evaluate.num_gpus=<number of gpus>]

Required Arguments#

  • -e, --experiment_spec: The experiment spec file to set up the evaluation experiment.

  • evaluate.checkpoint: The .pth model to be evaluated.

Optional Arguments#

Running Inference with an DINO Model#

inference#

The inference parameter defines the hyperparameters of the inference process.

inference:
  checkpoint: /path/to/model.pth
  conf_threshold: 0.5
  color_map:
    person: red
    car: blue

Parameter

Datatype

Default

Description

Supported Values

checkpoint

string

Path to PyTorch model to inference

results_dir

string

/results/inference

The directory to save inference results

num_gpus

unsigned int

1

The number of GPUs to use for distributed inference

>0

gpu_ids

List[int]

[0]

The indices of the GPU’s to use for distributed inference

trt_engine

string

Path to TensorRT model to inference. Only used with TAO deploy

conf_threshold

float

0.5

Confidence threshold to filter predictions

>=0

color_map

dict

Color map of the bounding boxes for each class

string dict

The inference tool for DINO models can be used to visualize bboxes and generate frame-by- frame KITTI format labels on a directory of images.

tao model dino inference [-h] -e <experiment spec file>
               inference.checkpoint=<model to be inferenced>
               [inference.<inference_option>=<inference_option_value>]
               [inference.gpu_ids=<gpu indices>]
               [inference.num_gpus=<number of gpus>]

Required Arguments#

  • -e, --experiment_spec: The experiment spec file to set up the inference experiment.

  • inference.checkpoint: The .pth model to inference.

Optional Arguments#

Exporting the Model#

export#

The export parameter defines the hyperparameters of the export process.

export:
  checkpoint: /path/to/model.pth
  onnx_file: /path/to/model.onnx
  on_cpu: False
  opset_version: 12
  input_channel: 3
  input_width: 960
  input_height: 544
  batch_size: -1

Parameter

Datatype

Default

Description

Supported Values

checkpoint

string

The path to the PyTorch model to export

onnx_file

string

The path to the .onnx file

on_cpu

bool

True

If this value is True, the DMHA module is exported as standard PyTorch. If this value is False, the module is exported using the TRT Plugin.

True, False

opset_version

unsigned int

12

The opset version of the exported ONNX

>0

input_channel

unsigned int

3

The input channel size. Only the value 3 is supported.

3

input_width

unsigned int

960

The input width

>0

input_height

unsigned int

544

The input height

>0

batch_size

unsigned int

-1

The batch size of the ONNX model. If this value is set to -1, the export uses dynamic batch size.

>=-1

tao model dino export [-h] -e <experiment spec file>
               export.checkpoint=<model to export>
               export.onnx_file=<onnx path>
               [export.<export_option>=<export_option_value>]

Required Arguments#

  • -e, --experiment_spec: The path to an experiment spec file.

  • export.checkpoint: The .pth model to export.

  • export.onnx_file: The path where the .etlt or .onnx model is saved.

Optional Arguments#

TensorRT Engine Generation, Validation, and int8 Calibration#

For deployment, refer to TAO Deploy documentation for DINO.

Deploying to DeepStream#

Refer to the Integrating a Deformable DETR Model documentation for DINO page for more information about deploying a Deformable DETR model to DeepStream.