DINO is an object-detection model included in the TAO. It supports the following tasks:
convert
train
evaluate
inference
export
distill
Each task is explained in detail in the following sections.
Throughout this documentation are references to
$EXPERIMENT_IDand
$DATASET_IDin the FTMS Client sections.
For instructions on creating a dataset using the remote client, refer to the Creating a dataset section in the Remote Client documentation.
For instructions on creating an experiment using the remote client, refer to the Creating an experiment section in the Remote Client documentation.
-
The spec format is YAML for TAO Launcher, and JSON for FTMS Client.
File-related parameters, such as dataset paths or pretrained model paths, are required only for TAO Launcher, not for FTMS Client.
Data Input for DINO#
DINO expects directories of images for training or validation and annotated JSON files in COCO format.
The
category_id from your COCO JSON file should start from 1 because 0 is set as a background class.
In addition,
dataset.num_classes should be set to
max class_id + 1. For instance, even though
there are only 80 classes used in COCO, the largest
class_id is 90, so
dataset.num_classes
should be set to 91.
Creating an Experiment Spec File#
The training experiment spec file for DINO includes
model,
train, and
dataset parameters.
Here is an example spec file for training a DINO model with a
resnet_50 backbone on a COCO dataset.
Use the following command to get an experiment spec file for DINO:
BASE_EXPERIMENT_ID=$(tao dino list-base-experiments | jq -r '.[0].id')
SPECS=$(tao dino get-job-schema --action train --base-experiment-id $BASE_EXPERIMENT_ID | jq -r '.default')
dataset:
train_data_sources:
- image_dir: /path/to/coco/train2017/
json_file: /path/to/coco/annotations/instances_train2017.json
val_data_sources:
- image_dir: /path/to/coco/val2017/
json_file: /path/to/coco/annotations/instances_val2017.json
num_classes: 91
batch_size: 4
workers: 8
augmentation:
scales: [480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800]
input_mean: [0.485, 0.456, 0.406]
input_std: [0.229, 0.224, 0.225]
horizontal_flip_prob: 0.5
train_random_resize: [400, 500, 600]
train_random_crop_min: 384
train_random_crop_max: 600
random_resize_max_size: 1333
test_random_resize: 800
model:
pretrained_model_path: /path/to/your-fan-small-pretrained-model
backbone: fan_small
train_backbone: True
num_feature_levels: 4
dec_layers: 6
enc_layers: 6
num_queries: 300
num_queries: 900
dropout_ratio: 0.0
dim_feedforward: 2048
train:
optim:
lr: 0.0001
lr_backbone: 0.00001
lr_linear_proj_mult: 0.1
momentum: 0.9
weight_decay: 0.0001
lr_scheduler: MultiStep
lr_decay: 0.1
lr_steps: [11]
optimizer: AdamW
num_epochs: 10
checkpoint_interval: 5
validation_interval: 5
clip_grad_norm: 0.1
precision: fp32
distributed_strategy: ddp
activation_checkpoint: True
num_gpus: 1
gpu_ids: [0]
num_nodes: 1
seed: 1234
|
Parameter
|
Data Type
|
Default
|
Description
|
Supported Values
|
|
dict config
|
–
|
The configuration of the model architecture
|
|
dict config
|
–
|
The configuration of the dataset
|
|
dict config
|
–
|
The configuration of the training task
|
|
dict config
|
–
|
The configuration of the evaluation task
|
|
dict config
|
–
|
The configuration of the inference task
|
|
string
|
None
|
The encryption key to encrypt and decrypt model files
|
|
string
|
/results
|
The directory where experiment results are saved
|
|
dict config
|
–
|
The configuration of the ONNX export task
|
|
dict config
|
–
|
The configuration of the TensorRT generation task. Only used in TAO deploy
model#
The
model parameter provides options to change the DINO architecture.
model:
pretrained_model_path: /path/to/your-fan-small-pretrained-model
backbone: fan_small
train_backbone: True
num_feature_levels: 4
dec_layers: 6
enc_layers: 6
num_queries: 300
num_queries: 900
dropout_ratio: 0.0
dim_feedforward: 2048
|
Parameter
|
Datatype
|
Default
|
Description
|
Supported Values
|
|
string
|
None
|
The optional path to the pretrained backbone file
|
string to the path
|
backbone
|
string
|
resnet_50
|
The backbone name of the model. FAN, ResNet 34/50, EfficientViT, Swin and NVDINOv2 are supported.
|
resnet_34, resnet_50,
efficientvit_b0/b1/b2/b3,
swin_tiny_patch4_window7_224,
swin_base_patch4_window7_224,
swin_base_patch4_window12_384,
swin_large_patch4_window7_224,
swin_large_patch4_window12_384,
vit_large_dinov2, fan_tiny,
fan_small, fan_base,
fan_large, vit_large_nvdinov2,
|
|
bool
|
True
|
A flag specifying whether to train the backbone or not
|
True, False
|
|
unsigned int
|
4
|
The number of feature levels to use in the model
|
1,2,3,4,5
|
return_interm_indices
|
int list
|
[1, 2, 3, 4]
|
The index of feature levels to use in the model. The length must match
num_feature_levels.
|
[0, 1, 2, 3, 4], [1, 2, 3, 4],
[1, 2, 3], [1, 2], [1]
|
|
unsigned int
|
6
|
The number of decoder layers in the transformer
|
>0
|
|
unsigned int
|
6
|
The number of encoder layers in the transformer
|
>0
|
|
unsigned int
|
900
|
The number of queries
|
>0
|
|
unsigned int
|
2048
|
The dimension of the feedforward network
|
>0
|
|
unsigned int
|
300
|
The number of top-K predictions selected during post-process
|
>0
|
|
bool
|
True
|
A flag specifying whether to enbable contrastive de-noising training in DINO
|
True, False
|
|
unsigned_int
|
100
|
The number of de-noising queries in DINO
|
>0
|
|
float
|
1.0
|
The scale of noise applied to boxes during contrastive de-noising. If this value is 0, noise is not applied.
|
>=0
|
|
float
|
0.5
|
The scale of noise applied to labels during contrastive de-noising. If this value is 0, noise is not applied.
|
>=0
|
|
unsigned_int
|
20
|
The temperature applied to the height dimension of Positional Sine Embedding
|
>0
|
|
unsigned_int
|
20
|
The temperature applied to the width dimension of Positional Sine Embedding
|
>0
|
fix_refpoints_hw
|
signed_int
|
-1
|
If this value is -1, width and height are learned seperately for each box. If this value is -2,
a shared width and height are learned. A value greater than 0 specifies learning with a fixed number.
|
>0, -1, -2
|
|
float
|
0.0
|
The probability to drop hidden units
|
0.0 ~ 1.0
|
|
float
|
2.0
|
The relative weight of the classification error in the matching cost
|
>0.0
|
|
float
|
5.0
|
The relative weight of the L1 error of the bounding box coordinates in the matching cost
|
>0.0
|
|
float
|
2.0
|
The relative weight of the GIoU loss of the bounding box in the matching cost
|
>0.0
|
|
float
|
0.25
|
The alpha in the focal loss
|
>0.0
|
|
bool
|
True
|
A flag specifying whether to use auxiliary decoding losses (loss at each decoder layer)
|
True, False
train#
The
train parameter defines the hyperparameters of the training process.
train:
optim:
lr: 0.0001
lr_backbone: 0.00001
lr_linear_proj_mult: 0.1
momentum: 0.9
weight_decay: 0.0001
lr_scheduler: MultiStep
lr_decay: 0.1
lr_steps: [11]
optimizer: AdamW
num_epochs: 10
checkpoint_interval: 5
validation_interval: 5
clip_grad_norm: 0.1
precision: fp32
distributed_strategy: ddp
activation_checkpoint: True
num_gpus: 1
gpu_ids: [0]
num_nodes: 1
seed: 1234
|
Parameter
|
Datatype
|
Default
|
Description
|
Supported Values
|
|
unsigned int
|
1
|
The number of GPUs to use for distributed training
|
>0
|
|
List[int]
|
[0]
|
The indices of the GPU’s to use for distributed training
|
|
unsigned int
|
1234
|
The random seed for random, NumPy, and torch
|
>0
|
|
unsigned int
|
10
|
The total number of epochs to run the experiment
|
>0
|
|
unsigned int
|
1
|
The epoch interval at which the checkpoints are saved
|
>0
|
|
unsigned int
|
1
|
The epoch interval at which the validation is run
|
>0
|
|
string
|
The intermediate PyTorch Lightning checkpoint to resume training from
|
|
string
|
/results/train
|
The directory to save training results
|
|
dict config
|
The config for the optimizer, including the learning rate, learning scheduler, and weight decay
|
>0
|
|
float
|
0.1
|
amount to clip the gradient by the L2 norm. A value of 0.0 specifies no clipping
|
>=0
|
|
string
|
fp32
|
Specifying “fp16” enables precision training. Training with fp16 can help save GPU memory.
|
fp32, fp16
|
distributed_strategy
|
string
|
ddp
|
The multi-GPU training strategy. DDP (Distributed Data Parallel) and FSDP
(Fully Sharded Data Parallel) are supported.
|
ddp, fsdp
|
|
bool
|
True
|
A True value instructs train to recompute in backward pass to save GPU memory, rather than storing activations.
|
True, False
|
|
string
|
Path to pretrained model checkpoint path to load for finetuning
|
|
unsigned int
|
1
|
The number of nodes. If the value is larger than 1, multi-node is enabled
|
>0
|
|
string list
|
[]
|
The list of layer names in the model to freeze. Example [“backbone”, “transformer.encoder”, “input_proj”]
|
|
bool
|
False
|
Whether to print detailed learning rate scaling from the optimizer
|
True, False
distill#
The
distill parameter defines the hyperparameters for distillation.
distill:
teacher:
backbone: fan_small
train_backbone: False
num_feature_levels: 4
dec_layers: 6
enc_layers: 6
num_queries: 900
dropout_ratio: 0.0
dim_feedforward: 2048
pretrained_teacher_model_path: /path/to/your-fan-small-pretrained-teacher-model
bindings:
- teacher_module_name: teacher_module_name
student_module_name: student_module_name
criterion: L2
weight: 1.0
|
Parameter
|
Datatype
|
Default
|
Description
|
Supported Values
|
|
dict config
|
The config for the teacher model
|
>0
|
|
string
|
Path to pretrained teacher model checkpoint path to load for distillation
|
>0
|
|
list dict
|
The list of bindings between teacher and student to use for calculating distill loss
*
teacher_module_name : The name of the teacher module
*
student_module_name : The name of the student module
*
criterion : The name of the criterion to use for calculating binding loss (L1, L2, KL)
*
weight : The value of the weight to use for the binding default is 1.0
|
>0.0
optim#
The
optim parameter defines the config for the optimizer in training, including the
learning rate, learning scheduler, and weight decay.
optim:
lr: 0.0001
lr_backbone: 0.00001
lr_linear_proj_mult: 0.1
momentum: 0.9
weight_decay: 0.0001
lr_scheduler: MultiStep
lr_decay: 0.1
lr_steps: [11]
optimizer: AdamW
|
Parameter
|
Datatype
|
Default
|
Description
|
Supported Values
|
|
float
|
1e-4
|
The initial learning rate for training the model, excluding the backbone
|
>0.0
|
|
float
|
1e-5
|
The initial learning rate for training the backbone
|
>0.0
|
|
float
|
0.1
|
The initial learning rate for training the linear projection layer
|
>0.0
|
|
float
|
0.9
|
The momentum for the AdamW optimizer
|
>0.0
|
|
float
|
1e-4
|
The weight decay coefficient
|
>0.0
|
lr_scheduler
|
string
|
MultiStep
|
The learning scheduler:
*
MultiStep : Decrease the
lr by
lr_decay from
lr_steps
*
StepLR : Decrease the
lr by
lr_decay at every
lr_step_size
|
MultiStep/StepLR
|
|
float
|
0.1
|
The decreasing factor for the learning rate scheduler
|
>0.0
|
|
float
|
0.65
|
The layer-wise learning decay rate used for ViT only
|
>0.0
|
|
int list
|
[11]
|
The steps to decrease the learning rate for the
|
int list
|
|
unsigned int
|
11
|
The steps to decrease the learning rate for the
|
>0
|
|
string
|
val_loss
|
The monitor value for the
|
val_loss/train_loss
|
|
string
|
AdamW
|
The optimizer to use during training
|
AdamW/SGD
dataset#
The
dataset parameter defines the dataset source, training batch size, and
augmentation.
dataset:
train_data_sources:
- image_dir: /path/to/coco/images/train2017/
json_file: /path/to/coco/annotations/instances_train2017.json
val_data_sources:
- image_dir: /path/to/coco/images/val2017/
json_file: /path/to/coco/annotations/instances_val2017.json
test_data_sources:
image_dir: /path/to/coco/images/val2017/
json_file: /path/to/coco/annotations/instances_val2017.json
infer_data_sources:
image_dir: /path/to/coco/images/val2017/
classmap: /path/to/coco/annotations/coco_classmap.txt
num_classes: 91
batch_size: 4
workers: 8
|
Parameter
|
Datatype
|
Default
|
Description
|
Supported Values
|
train_data_sources
|
list dict
|
The training data sources:
*
image_dir : The directory that contains the training images
*
json_file : The path of the JSON file, which uses training-annotation COCO format
|
val_data_sources
|
list dict
|
The validation data sources:
*
image_dir : The directory that contains the validation images
*
json_file : The path of the JSON file, which uses validation-annotation COCO format
|
test_data_sources
|
dict
|
The test data sources for evaluation:
*
image_dir : The directory that contains the test images
*
json_file : The path of the JSON file, which uses test-annotation COCO format
|
infer_data_sources
|
dict
|
The infer data sources for inference:
*
image_dir : The directory that contains the inference images
*
classmap : The path of the
.txt file that contains class names
|
|
dict config
|
The parameters to define the augmentation method
|
|
unsigned int
|
91
|
The number of classes in the training data
|
>0
|
|
unsigned int
|
4
|
The batch size for training and validation
|
>0
|
|
unsigned int
|
8
|
The number of parallel workers processing data
|
>0
|
train_sampler
|
string
|
default_sampler
|
The minibatch sampling method. Non-default sampling methods can be enabled for multi-node
jobs. This config doesn’t have any effect if
dataset_type isn’t set to default
|
default_sampler, non_uniform_sampler,
uniform_sampler
|
dataset_type
|
string
|
serialized
|
If set to
default, we follow the standard
CocoDetection` dataset structure
from the torchvision, which loads COCO annotation in every subprocess. This leads to redudant
copy of data and can cause RAM to explod if
workers` is high. If set to
serialized,
the data is serialized through pickle and
torch.Tensor` that allows the data to be shared
across subprocess. As a result, RAM usage can be greatly improved.
|
serialized, default
augmentation#
The
augmentation parameter contains hyperparameters for augmentation.
augmentation:
scales: [480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800]
input_mean: [0.485, 0.456, 0.406]
input_std: [0.229, 0.224, 0.225]
horizontal_flip_prob: 0.5
train_random_resize: [400, 500, 600]
train_random_crop_min: 384
train_random_crop_max: 600
random_resize_max_size: 1333
test_random_resize: 800
|
Parameter
|
Datatype
|
Default
|
Description
|
Supported Values
|
scales
|
int list
|
[480, 512, 544, 576,
608, 640, 672, 704,
736, 768, 800]
|
A list of sizes to perform random resize.
|
|
float list
|
[0.485, 0.456, 0.406]
|
The input mean for RGB frames:
|
float list / size=1 or 3
|
|
float list
|
[0.229, 0.224, 0.225]
|
The input standard deviation for RGB frames:
|
float list / size=1 or 3
|
|
float
|
0.5
|
The probability for horizonal flip during training
|
>=0
|
|
int list
|
[400, 500, 600]
|
A list of sizes to perform random resize for training data
|
int list
|
|
unsigned int
|
384
|
The minimum random crop size for training data
|
>0
|
|
unsigned int
|
600
|
The maximum random crop size for training data
|
>0
|
|
unsigned int
|
1333
|
The maximum random resize size for training data
|
>0
|
|
unsigned int
|
800
|
The random resize size for test data
|
>0
|
fixed_padding
|
bool
|
True
|
A flag specifying whether to resize the image (with no padding) to
(sorted(scales[-1]), random_resize_max_size) to prevent a CPU
memory leak.
|
True/False
|
fixed_random_crop
|
unsigned int
|
A flag to enable Large Scale Jittering, which is used for ViT backbones.
The resulting image resolution is fixed to
fixed_random_crop.
|
Divisible by 32
Example Spec File for ViT Backbones#
The following spec file is only relevant for TAO versions 5.2 and later. Vision Transformer (ViT) requires a different augmentation and learning rate decay to work as backbone to a detector.
dataset:
train_data_sources:
- image_dir: /path/to/coco/train2017/
json_file: /path/to/coco/annotations/instances_train2017.json
val_data_sources:
- image_dir: /path/to/coco/val2017/
json_file: /path/to/coco/annotations/instances_val2017.json
num_classes: 91
batch_size: 4
workers: 8
augmentation:
input_mean: [0.485, 0.456, 0.406]
input_std: [0.229, 0.224, 0.225]
horizontal_flip_prob: 0.5
fixed_random_crop: 1024
random_resize_max_size: 1024
test_random_resize: 1024
fixed_padding: True
model:
pretrained_model_path: /path/to/nvdinov2_patch16_model
backbone: vit_large_nvdinov2
train_backbone: False
num_feature_levels: 4
dec_layers: 6
enc_layers: 6
num_queries: 900
dropout_ratio: 0.0
dim_feedforward: 2048
train:
optim:
lr_backbone: 2e-5
lr: 2e-4
lr_steps: [11]
layer_decay_rate: 0.65
num_epochs: 12
Training the Model#
Use the following command to run DINO training:
TRAIN_JOB_ID=$(tao dino create-job \
--kind experiment \
--name "dino_train" \
--action train \
--workspace-id $WORKSPACE_ID \
--specs "$TRAIN_SPECS" \
--train-datasets '["'$DATASET_ID'"]' \
--eval-dataset "$DATASET_ID" \
--base-experiment-ids '["'$BASE_EXPERIMENT_ID'"]' \
--encryption-key "nvidia_tlt" | jq -r '.id')
tao model dino train [-h] -e <experiment_spec> [results_dir=<global_results_dir>] [model.<model_option>=<model_option_value>] [dataset.<dataset_option>=<dataset_option_value>] [train.<train_option>=<train_option_value>] [train.gpu_ids=<gpu indices>] [train.num_gpus=<number of gpus>]
Required Arguments
The only required argument is the path to the experiment spec:
-e, --experiment_spec: The experiment specification file to set up the training experiment
Optional Arguments
You can set optional arguments to override the option values in the experiment spec file.
-h, --help: Show this help message and exit.
model.<model_option>: The model options.
dataset.<dataset_option>: The dataset options.
train.<train_option>: The train options.
train.optim.<optim_option>: The optimizer options
For training, evaluation, and inference, we expose two variables for each task:
num_gpus and
gpu_ids, which
default to
1 and
[0], respectively. If both are passed, but are inconsistent, for example
num_gpus = 1,
gpu_ids = [0, 1], then they are modified to follow the setting that implies more GPUs; in the same example
num_gpus is modified from 1 to 2.
In some cases multi-GPU training may result in a segmentation fault. You can circumvent this by
setting the enviroment variable
OMP_NUM_THREADS to 1. Depending upon your model of execution, you may use the following methods to set
this variable:
CLI Launcher:
You may set the environment variable by adding the following fields to the
Envsfield of your
~/.tao_mounts.jsonfile as mentioned in bullet 3 in ths section Running the launcher.
{ "Envs": [ { "variable": "OMP_NUM_THREADSR", "value": "1" } }
Docker:
You may set environment variables in Docker by setting the
-eflag in the Docker command line.
docker run -it --rm --gpus all \ -e OMP_NUM_THREADS=1 \ -v /path/to/local/mount:/path/to/docker/mount nvcr.io/nvidia/tao/tao-toolkit:5.5.0-pyt <model> train -e
Checkpointing and Resuming Training
At every
train.checkpoint_interval, a PyTorch Lightning checkpoint is saved. It is called
model_epoch_<epoch_num>.pth.
Checkpoints are saved in
train.results_dir, like this:
$ ls /results/train
'model_epoch_000.pth'
'model_epoch_001.pth'
'model_epoch_002.pth'
'model_epoch_003.pth'
'model_epoch_004.pth'
The latest checkpoint is also saved as
dino_model_latest.pth.
Training automatically resumes from
dino_model_latest.pth, if it exists in
train.results_dir.
This is superseded by
train.resume_training_checkpoint_path, if it is provided.
The major implication of this logic is that, if you wish to trigger fresh training from scratch, either:
Specify a new, empty results directory (Recommended)
Remove the latest checkpoint from the results directory
Optimizing Resource for Training DINO#
Training DINO requires strong GPUs (for example, V100/A100) with at least 15GB of VRAM and a lot of CPU memory to be trained on a standard dataset like COCO. This section outlines some of the strategies you can use to launch training with only limited resources.
Optimize GPU Memory#
There are various ways to optimize GPU memory usage. One trick is to reduce
dataset.batch_size. However, this can cause your training to take longer than usual.
We recommend setting the following configurations to optimize GPU consumption:
Set
train.precisionto
fp16to enable automatic mixed precision training. This can reduce your GPU memory usage by 50%.
Set
train.activation_checkpointto
Trueto enable activation checkpointing. By recomputing the activations instead of caching them into memory, the memory usage can be improved.
Set
train.distributed_strategyto
fsdpto enable Fully Sharded Data Parallel training. This shares gradient calculation across different processes to help reduce GPU memory.
Try using more lightweight backbones like
fan_tinyor freeze the backbone through setting
model.train_backboneto False.
Try changing the augmentation resolution in
dataset.augmentationdepending on your dataset.
Optimize CPU Memory#
To speed up data loading, it is a common practice to set a high number of workers to spawn multiple processes. However, this can cause your CPU memory to become Out of Memory, if the size of your annotation file is very large. We recommend setting the following configurations to optimize CPU consumption:
Set
dataset.dataset_typeto
serializedso that the COCO-based annotation data can be shared across different subprocesses.
Set
dataset.augmentation.fixed_paddingto True so that images are padded before the batch formulation. Due to random resize and random crop augmentation during training, the resulting image resolution after transform can vary across images. Such variable image resolutions can cause memory leak and the CPU memory to slowly stacks up until it becomes Out of Memory in the middle of training. This is the limitation of PyTorch so we advise setting
fixed_paddingto True to help stablize the CPU memory usage.
Distilling the Model#
To distill a DINO model, use this command:
DISTILL_JOB_ID=$(tao dino create-job \
--kind experiment \
--name "dino_distill" \
--action distill \
--workspace-id $WORKSPACE_ID \
--encryption-key "nvidia_tlt" | jq -r '.id')
tao model dino distill [-h] -e <experiment_spec>
[-r <results_dir>]
[-k <key>]
Required Arguments
The following arguments are required to run the command.
-e, --experiment_spec: The experiment spec file to set up the distillation experiment
Optional Arguments
The following arguments are optional to run the command.
-r, --results_dir: The path to the folder where the experiment outputs are written. If this argument is not specified, the
results_dirfrom the spec file is used.
-k, --key: A user-specific encoding key to save or load a
.tltmodel. If this argument is not specified, the model checkpoint is not encrypted.
--gpus: The number of GPUs used to run training.
--num_nodes: The number of nodes used to run training. If this value is larger than 1, distributed multi-node training is enabled.
-h, --help: Show this help message and exit.
Sample Usage
The following is an example of the
distill command:
tao dino model distill -e /path/to/spec.yaml
Evaluating the Model#
evaluate#
The
evaluate parameter defines the hyperparameters of the evaluate process.
evaluate:
checkpoint: /path/to/model.pth
conf_threshold: 0.0
|
Parameter
|
Datatype
|
Default
|
Description
|
Supported Values
|
|
string
|
Path to PyTorch model to evaluate
|
|
string
|
/results/evaluate
|
The directory to save evaluation results
|
|
unsigned int
|
1
|
The number of GPUs to use for distributed evaluation
|
>0
|
|
List[int]
|
[0]
|
The indices of the GPU’s to use for distributed evaluation
|
|
string
|
Path to TensorRT model to evaluate. Only used with TAO deploy
|
|
float
|
0.0
|
Confidence threshold to filter predictions
|
>=0
To run evaluation with a DINO model, use this command:
EVAL_JOB_ID=$(tao dino create-job \
--kind experiment \
--name "dino_evaluate" \
--action evaluate \
--workspace-id $WORKSPACE_ID \
--parent-job-id $TRAIN_JOB_ID \
--eval-dataset "$DATASET_ID" \
--specs "$EVALUATE_SPECS" \
--base-experiment-ids '["'$BASE_EXPERIMENT_ID'"]' \
--encryption-key "nvidia_tlt" | jq -r '.id')
tao model dino evaluate [-h] -e <experiment_spec> evaluate.checkpoint=<model to be evaluated> [evaluate.<evaluate_option>=<evaluate_option_value>] [evaluate.gpu_ids=<gpu indices>] [evaluate.num_gpus=<number of gpus>]
Required Arguments
The following arguments are required.
-e, --experiment_spec: The experiment spec file to set up the evaluation experiment.
evaluate.checkpoint: The
.pthmodel to be evaluated.
Optional Arguments
The following arguments are optional to run the command.
evaluate.<evaluate_option>: The evaluate options.
Running Inference with an DINO Model#
inference#
The
inference parameter defines the hyperparameters of the inference process.
inference:
checkpoint: /path/to/model.pth
conf_threshold: 0.5
color_map:
person: red
car: blue
|
Parameter
|
Datatype
|
Default
|
Description
|
Supported Values
|
|
string
|
Path to PyTorch model to inference
|
|
string
|
/results/inference
|
The directory to save inference results
|
|
unsigned int
|
1
|
The number of GPUs to use for distributed inference
|
>0
|
|
List[int]
|
[0]
|
The indices of the GPU’s to use for distributed inference
|
|
string
|
Path to TensorRT model to inference. Only used with TAO deploy
|
|
float
|
0.5
|
Confidence threshold to filter predictions
|
>=0
|
|
dict
|
Color map of the bounding boxes for each class
|
string dict
The inference tool for DINO models can be used to visualize bboxes and generate frame-by- frame KITTI format labels on a directory of images.
INFER_JOB_ID=$(tao dino create-job \
--kind experiment \
--name "dino_inference" \
--action inference \
--workspace-id $WORKSPACE_ID \
--parent-job-id $TRAIN_JOB_ID \
--inference-dataset "$DATASET_ID" \
--specs "$INFERENCE_SPECS" \
--base-experiment-ids '["'$BASE_EXPERIMENT_ID'"]' \
--encryption-key "nvidia_tlt" | jq -r '.id')
tao model dino inference [-h] -e <experiment spec file> inference.checkpoint=<model to be inferenced> [inference.<inference_option>=<inference_option_value>] [inference.gpu_ids=<gpu indices>] [inference.num_gpus=<number of gpus>]
Required Arguments
The following arguments are required to run the command.
-e, --experiment_spec: The experiment spec file to set up the inference experiment.
inference.checkpoint: The
.pthmodel to inference.
Optional Arguments
The following arguments are optional to run the command.
inference.<inference_option>: The inference options.
Exporting the Model#
export#
The
export parameter defines the hyperparameters of the export process.
export:
checkpoint: /path/to/model.pth
onnx_file: /path/to/model.onnx
on_cpu: False
opset_version: 12
input_channel: 3
input_width: 960
input_height: 544
batch_size: -1
|
Parameter
|
Datatype
|
Default
|
Description
|
Supported Values
|
|
string
|
The path to the PyTorch model to export
|
|
string
|
The path to the
|
|
bool
|
True
|
If this value is True, the DMHA module is exported as standard PyTorch. If this value is False, the module is exported using the TRT Plugin.
|
True, False
|
|
unsigned int
|
12
|
The opset version of the exported ONNX
|
>0
|
|
unsigned int
|
3
|
The input channel size. Only the value 3 is supported.
|
3
|
|
unsigned int
|
960
|
The input width
|
>0
|
|
unsigned int
|
544
|
The input height
|
>0
|
|
unsigned int
|
-1
|
The batch size of the ONNX model. If this value is set to -1, the export uses dynamic batch size.
|
>=-1
EXPORT_JOB_ID=$(tao dino create-job \
--kind experiment \
--name "dino_export" \
--action export \
--workspace-id $WORKSPACE_ID \
--parent-job-id $TRAIN_JOB_ID \
--specs "$EXPORT_SPECS" \
--base-experiment-ids '["'$BASE_EXPERIMENT_ID'"]' \
--encryption-key "nvidia_tlt" | jq -r '.id')
tao model dino export [-h] -e <experiment spec file> export.checkpoint=<model to export> export.onnx_file=<onnx path> [export.<export_option>=<export_option_value>]
Required Arguments
The following arguments are required to run the command.
-e, --experiment_spec: The path to an experiment spec file.
export.checkpoint: The
.pthmodel to export.
export.onnx_file: The path where the
.etltor
.onnxmodel is saved.
Optional Arguments
The following arguments are optional to run the command.
export.<export_option>: The export options.
TensorRT Engine Generation, Validation, and int8 Calibration#
For deployment, refer to TAO Deploy documentation for DINO.
Deploying to DeepStream#
Refer to the Integrating a Deformable DETR Model documentation for DINO page for more information about deploying a Deformable DETR model to DeepStream.