TAO v5.5.0
NVIDIA TAO v5.5.0

Grounding DINO

https://github.com/vpraveen-nv/model_card_images/blob/main/cv/purpose_built_models/grounding_dino/commercial_swint_gdino.png?raw=true

Grounding DINO is an open vocabulary object-detection model included in the TAO. Through joint training of text and image data, Grounding DINO is able to accept wide range of text data as input and output the corresponding bounding boxes.

It supports the following tasks:

  • train

  • evaluate

  • inference

  • export

These tasks can be invoked from the TAO Launcher using the following convention on the command-line:

Copy
Copied!
            

tao model grounding_dino <sub_task> <args_per_subtask>

where, args_per_subtask are the command-line arguments required for a given subtask. Each subtask is explained in detail in the following sections.

Grounding DINO expects directories of images for training files to be under ODVG format with JSONL and validation to be annotated JSON files in COCO format.

Note

Unlike other object detection networks in TAO, the category_id from your COCO JSON file for Grounding DINO should start from 0 and every category id must be contiguous. Meaning the category can range from 0 to num_classes - 1. Because the original COCO annotation does not have a contiguous category id, see the TAO Data Service tao dataset annotations convert.

The training experiment spec file for Grounding DINO includes model, train, and dataset parameters. The following is an example spec file for finetuning a Grounding DINO model with a swin_tiny_224_1k backbone on a COCO dataset:

Copy
Copied!
            

dataset: train_data_sources: - image_dir: /path/to/coco/train2017/ json_file: /path/to/coco/annotations/instances_train2017.jsonl # odvg format label_map: /path/to/coco/annotations/instances_train2017_labelmap.json val_data_sources: - image_dir: /path/to/coco/val2017/ json_file: /path/to/coco/annotations/instances_val2017_contiguous.json # category ids need to be contiguous max_labels: 80 # Max number of postive + negative labels passed to the text encoder batch_size: 4 workers: 8 dataset_type: serialized # To reduce the system memory usage augmentation: scales: [480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800] input_mean: [0.485, 0.456, 0.406] input_std: [0.229, 0.224, 0.225] horizontal_flip_prob: 0.5 train_random_resize: [400, 500, 600] train_random_crop_min: 384 train_random_crop_max: 600 random_resize_max_size: 1333 test_random_resize: 800 model: backbone: swin_tiny_224_1k train_backbone: True num_feature_levels: 4 dec_layers: 6 enc_layers: 6 num_queries: 300 num_queries: 900 dropout_ratio: 0.0 dim_feedforward: 2048 log_scale: auto class_embed_bias: True # Adding bias in the contrastive embedding layer for training stability train: optim: lr_backbone: 2e-5 lr: 2e-4 lr_steps: [10, 20] num_epochs: 30 freeze: ["backbone.0", "bert"] # if only finetuning pretrained_model_path: /path/to/your-gdino-pretrained-model # if only finetuning precision: bf16 # for efficient training

Field

value_type

Description

default_value

valid_min

valid_max

valid_options

automl_enabled

encryption_key string FALSE
results_dir string /results FALSE
wandb collection FALSE
model collection Configurable parameters to construct the model for a Grounding DINO experiment. FALSE
dataset collection Configurable parameters to construct the dataset for a Grounding DINO experiment. FALSE
train collection Configurable parameters to construct the trainer for a Grounding DINO experiment. FALSE
evaluate collection Configurable parameters to construct the evaluator for a Grounding DINO experiment. FALSE
inference collection Configurable parameters to construct the inferencer for a Grounding DINO experiment. FALSE
export collection Configurable parameters to construct the exporter for a Grounding DINO experiment. FALSE
gen_trt_engine collection Configurable parameters to construct the TensorRT engine builder for a Grounding DINO experiment. FALSE

model

The model parameter provides options to change the Grounding DINO architecture.

Copy
Copied!
            

model: pretrained_model_path: /path/to/your-gdino-pretrained-model backbone: swin_tiny_224_1k train_backbone: True num_feature_levels: 4 dec_layers: 6 enc_layers: 6 num_queries: 300 num_queries: 900 dropout_ratio: 0.0 dim_feedforward: 2048 log_scale: auto class_embed_bias: True

Field

value_type

Description

default_value

valid_min

valid_max

valid_options

automl_enabled

pretrained_backbone_path string [Optional] Path to a pretrained backbone file. FALSE

backbone

string

The backbone name of the model.
TAO implementation of Groudning DINO support Swin.

swin_tiny_224_1k

swin_tiny_224_1k,swin_base_224_22k,swin_base_384_22k,swin_large_224_22k,swin_large_384_22k

FALSE

num_queries int The number of queries 900 1 inf TRUE
num_feature_levels int The number of feature levels to use in the model 4 1 5 FALSE
set_cost_class float The relative weight of the classification error in the matching cost. 1.0 0.0 inf FALSE
set_cost_bbox float The relative weight of the L1 error of the bounding box coordinates in the matching cost. 5.0 0.0 inf FALSE
set_cost_giou float The relative weight of the GIoU loss of the bounding box in the matching cost. 2.0 0.0 inf FALSE
cls_loss_coef float The relative weight of the classification error in the final loss. 2.0 0.0 inf FALSE
bbox_loss_coef float The relative weight of the L1 error of the bounding box coordinates in the final loss. 5.0 0.0 inf FALSE
giou_loss_coef float The relative weight of the GIoU loss of the bounding box in the final loss. 2.0 0.0 inf FALSE
num_select int The number of top-K predictions selected during post-process 300 1 TRUE
interm_loss_coef float 1.0 FALSE
no_interm_box_loss bool No intermediate bbox loss. False FALSE
pre_norm bool Flag to add layer norm in the encoder or not. False FALSE
two_stage_type string Type of two stage in DINO standard standard,no FALSE
decoder_sa_type string Type of decoder self attention. sa sa,ca_label,ca_content FALSE
embed_init_tgt bool Flag to add target embedding True FALSE

fix_refpoints_hw

int

If this value is -1, width and height are learned seperately for each box.
If this value is -2, a shared width and height are learned.
A value greater than 0 specifies learning with a fixed number.

-1

-2

inf

FALSE

pe_temperatureH int The temperature applied to the height dimension of the positional sine embedding. 20 1 inf FALSE
pe_temperatureW int The temperature applied to the width dimension of the positional sine embedding. 20 1 inf FALSE
return_interm_indices list The index of feature levels to use in the model. The length must match num_feature_levels. [1, 2, 3, 4] FALSE
use_dn bool A flag specifying whether to enbable contrastive de-noising training in DINO True FALSE
dn_number int The number of denoising queries in DINO. 0 0 inf FALSE
dn_box_noise_scale float The scale of noise applied to boxes during contrastive de-noising. If this value is 0, noise is not applied. 1.0 0.0 inf FALSE

dn_label_noise_ratio

float

The scale of the noise applied to labels during
contrastive denoising. If this value is 0, then noise is
no applied.

0.5

0.0

FALSE

focal_alpha float The alpha value in the focal loss. 0.25 FALSE
focal_gamma float The gamma value in the focal loss. 2.0 FALSE
clip_max_norm float 0.1 FALSE
nheads int Number of heads 8 FALSE
dropout_ratio float The probability to drop hidden units. 0.0 0.0 1.0 FALSE
hidden_dim int Dimension of the hidden units. 256 FALSE
enc_layers int Numer of encoder layers in the transformer 6 1 TRUE
dec_layers int Numer of decoder layers in the transformer. 6 1 TRUE
dim_feedforward int Dimension of the feedforward network. 2048 1 FALSE
dec_n_points int Number of reference points in the decoder. 4 1 FALSE
enc_n_points int Number of reference points in the encoder. 4 1 FALSE

aux_loss

bool

A flag specifying whether to use auxiliary.
decoding losses (loss at each decoder layer)

True

FALSE

dilation bool A flag specifying whether enable dilation or not in the backbone. False FALSE

train_backbone

bool

Flag to set backbone weights as trainable or frozen.
When set to False, the backbone weights are frozen.

True

FALSE

text_encoder_type

string

BERT encoder type. If only the name of the type is provided,
the weight is download from the Hugging Face Hub.
If a path is provided, then we load the weight from the local path.

bert-base-uncased

FALSE

max_text_len int Maximum text length of BERT. 256 1 FALSE
class_embed_bias bool Flag to set bias in the contrastive embedding. False FALSE

log_scale

string

[Optional] The initial value of a learnable parameter to multiply with the similarity
matrix to normalize the output. Defaults to None.
- If set to ‘auto’, the similarity matrix is normalized by
a fixed value sqrt(d_c) where d_c is the channel number.
- If set to ‘none’ or None, there is no normalization applied.

none

FALSE

loss_types list Losses to be used during training. [‘labels’, ‘boxes’] FALSE
backbone_names list Prefix of the tensor names corresponding to the backbone. [‘backbone.0’, ‘bert’] FALSE
linear_proj_names list Linear projection layer names. [‘reference_points’, ‘sampling_offsets’] FALSE

train

The train parameter defines the hyperparameters of the training process.

Copy
Copied!
            

train: optim: lr: 0.0002 lr_backbone: 0.00002 momentum: 0.9 weight_decay: 0.0001 lr_scheduler: MultiStep lr_steps: [10, 20] lr_decay: 0.1 num_epochs: 30 checkpoint_interval: 1 precision: bf16 distributed_strategy: ddp activation_checkpoint: True num_gpus: 8 num_nodes: 1 freeze: ["backbone.0", "bert"] pretrained_model_path: /path/to/pretrained/model

Field

value_type

Description

default_value

valid_min

valid_max

valid_options

automl_enabled

num_gpus int The number of GPUs to run the train job. 1 1 FALSE
gpu_ids list List of GPU IDs to run the training on. The length of this list must be equal to the number of gpus in train.num_gpus. [0] FALSE
num_nodes int Number of nodes to run the training on. If > 1, then multi-node is enabled. 1 FALSE
seed int The seed for the initializer in PyTorch. If < 0, disable fixed seed. 1234 -1 inf FALSE
cudnn collection FALSE
num_epochs int Number of epochs to run the training. 10 1 inf TRUE
checkpoint_interval int The interval (in epochs) at which a checkpoint is saved. Helps resume training. 1 1 FALSE
validation_interval int The interval (in epochs) at which a evaluation is triggered by the validation dataset. 1 1 FALSE
resume_training_checkpoint_path string Path to the checkpoint to resume training from. FALSE
results_dir string Path to where all the assets generated from a task are stored. FALSE

freeze

list

List of layer names to freeze.
Example: [“backbone”, “transformer.encoder”, “input_proj”].

[]

FALSE

pretrained_model_path string Path to a pre-trained Deformable DETR model to initialize the current training from. FALSE

clip_grad_norm

float

Amount to clip the gradient by L2 Norm.
A value of 0.0 specifies no clipping.

0.1

FALSE

is_dry_run

bool

Whether to run the trainer in Dry Run mode. This serves
as a good means to validate the spec file and run a sanity check on the trainer
without actually initializing and running the trainer.

False

FALSE

optim collection Hyper parameters to configure the optimizer. FALSE
precision string Precision to run the training on. fp32 fp16,fp32,bf16 FALSE

distributed_strategy

string

The multi-GPU training strategy.
DDP (Distributed Data Parallel) and Fully Sharded DDP are supported.

ddp

ddp,fsdp

FALSE

activation_checkpoint

bool

A True value instructs train to recompute in backward pass to save GPU memory,
rather than storing activations.

True

FALSE

verbose bool Flag to enable printing of detailed learning rate scaling from the optimizer. False FALSE

optim

The optim parameter defines the config for the optimizer in training, including the learning rate, learning scheduler, and weight decay.

Copy
Copied!
            

optim: lr: 0.0002 lr_backbone: 0.00002 momentum: 0.9 weight_decay: 0.0001 lr_scheduler: MultiStep lr_steps: [10, 20] lr_decay: 0.1

Field

value_type

Description

default_value

valid_min

valid_max

valid_options

automl_enabled

optimizer string Type of optimizer used to train the network. AdamW AdamW,SGD FALSE
monitor_name string The metric value to be monitored for the AutoReduce Scheduler. val_loss val_loss,train_loss FALSE
lr float The initial learning rate for training the model, excluding the backbone. 0.0002 TRUE
lr_backbone float The initial learning rate for training the backbone. 2e-05 TRUE
lr_linear_proj_mult float The initial learning rate for training the linear projection layer. 0.1 TRUE
momentum float The momentum for the AdamW optimizer. 0.9 TRUE
weight_decay float The weight decay coefficient. 0.0001 TRUE

lr_scheduler

string

The learning scheduler:
* MultiStep : Decrease the lr by lr_decay from lr_steps
* StepLR : Decrease the lr by lr_decay at every lr_step_size.

MultiStep

MultiStep,StepLR

FALSE

lr_steps

list

The steps at which the learning rate must be decreased.
This is applicable only with the MultiStep LR.

[10]

FALSE

lr_step_size int The number of steps to decrease the learning rate in the StepLR. 10 TRUE
lr_decay float The decreasing factor for the learning rate scheduler. 0.1 TRUE

dataset

The dataset parameter defines the dataset source, training batch size, and augmentation.

Copy
Copied!
            

dataset: train_data_sources: - image_dir: /path/to/coco/train2017/ json_file: /path/to/coco/annotations/instances_train2017.jsonl # odvg format label_map: /path/to/coco/annotations/instances_train2017_labelmap.json - image_dir: /path/to/coco/train2017/ json_file: /path/to/coco/annotations/refcoco.jsonl # grounding dataset which doesn't require label_map val_data_sources: image_dir: /path/to/coco/val2017/ json_file: /path/to/coco/annotations/instances_val2017_contiguous.json # category ids need to be contiguous test_data_sources: image_dir: /path/to/coco/images/val2017/ json_file: /path/to/coco/annotations/instances_val2017.json infer_data_sources: - image_dir: /path/to/coco/images/val2017/ captions: ["blackcat", "car"] max_labels: 80 batch_size: 4 workers: 8

Field

value_type

Description

default_value

valid_min

valid_max

valid_options

automl_enabled

train_data_sources

list

The list of data sources for training:
* image_dir : The directory that contains the training images
* json_file : The path of the JSONL file, which uses training-annotation ODVG format
* label_map: (Optional) The path of the label mapping only required for detection dataset

[{‘image_dir’: ‘’, ‘json_file’: ‘’, ‘label_map’: ‘’}, {‘image_dir’: ‘’, ‘json_file’: ‘’}]

FALSE

val_data_sources

collection

The data source for validation:
* image_dir : The directory that contains the validation images
* json_file : The path of the JSON file, which uses validation-annotation COCO format.
Note: category id must start from 0 if to calculate validation loss.
Run Data Services annotation convert to making the categories contiguous.

{‘image_dir’: ‘’, ‘json_file’: ‘’}

FALSE

test_data_sources

collection

The data source for testing:
* image_dir : The directory that contains the test images
* json_file : The path of the JSON file, which uses test-annotation COCO format

{‘image_dir’: ‘’, ‘json_file’: ‘’}

FALSE

infer_data_sources

collection

The data source for inference:
* image_dir : The list of directories that contains the inference images
* captions : The list of caption to run inference

{‘image_dir’: [‘’], ‘captions’: [‘’]}

FALSE

batch_size int The batch size for training and validation 4 1 inf TRUE
workers int The number of parallel workers processing data 8 1 inf TRUE

pin_memory

bool

Flag to enable the dataloader to allocated pagelocked memory for faster
of data between the CPU and GPU.

True

FALSE

dataset_type

string

If set to default, the standard map-style dataset structure
from torch is followed, which loads ODVG annotation in every subprocess. This leads to a redudant
copy of data and can cause RAM to explode if workers is high. If set to serialized,
the data is serialized through pickle and torch.Tensor that allows the data to be shared
across subprocesses. As a result, RAM usage can be greatly improved.

serialized

serialized,default

FALSE

max_labels

int

The total number of labels to sample from. After sampling positive labels,
random negative samples are sampled so that the total number of labels is equal to max_labels.
For detection dataset, negative labels are categories not present in the image.
For grounding dataset, negative labels are phrases in the original caption not present in the image.
Setting higher max_labels may improve robustness of the model with the cost of longer training time.

50

1

inf

FALSE

eval_class_ids list IDs of the classes for evaluation. [1] FALSE
augmentation collection Configuration parameters for data augmentation. FALSE

augmentation

The augmentation parameter contains hyperparameters for augmentation.

Copy
Copied!
            

augmentation: scales: [480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800] input_mean: [0.485, 0.456, 0.406] input_std: [0.229, 0.224, 0.225] horizontal_flip_prob: 0.5 train_random_resize: [400, 500, 600] train_random_crop_min: 384 train_random_crop_max: 600 random_resize_max_size: 1333 test_random_resize: 800

Field

value_type

Description

default_value

valid_min

valid_max

valid_options

automl_enabled

scales list A list of sizes to perform random resize on. [480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800] FALSE
input_mean list The input mean for RGB frames [0.485, 0.456, 0.406] FALSE
input_std list The input standard deviation per pixel for RGB frames [0.229, 0.224, 0.225] FALSE
train_random_resize list A list of sizes to perform random resize for training data [400, 500, 600] FALSE
horizontal_flip_prob float The probability for horizonal flip during training 0.5 0.0 1.0 TRUE
train_random_crop_min int The minimum random crop size for training data 384 1 inf TRUE
train_random_crop_max int The maximum random crop size for training data 600 1 inf TRUE
random_resize_max_size int The maximum random resize size for training data 1333 1 inf TRUE
test_random_resize int The random resize size for test data 800 1 inf TRUE

fixed_padding

bool

A flag specifying whether to resize the image (with no padding) to
(sorted(scales[-1]), random_resize_max_size) to prevent a CPU ” memory leak.

TRUE

FALSE

fixed_random_crop

int

A flag to enable Large Scale Jittering, which is used for ViT backbones.
The resulting image resolution is fixed to fixed_random_crop.

1024

1

inf

FALSE

To train a Grounding DINO model, use this command:

Copy
Copied!
            

tao model grounding_dino train [-h] -e <experiment_spec>

Required Arguments

  • -e, --experiment_spec: The experiment specification file to set up the training experiment.

Optional Arguments

  • -h, --help: Show this help message and exit.

Sample Usage

The following is an example of the train command:

Copy
Copied!
            

tao grounding_dino model train -e /path/to/spec.yaml

Optimizing Resource for Training Grounding DINO

Training Grounding DINO requires strong GPUs (example: V100/A100) with at least 15GB of VRAM and a lot of CPU memory to be trained on a standard dataset like COCO. This section outlines some of the strategies you can use to launch training with limited resources.

Optimize GPU Memory

There are various ways to optimize GPU memory usage. One trick is to reduce dataset.batch_size. However, this can cause your training to take longer than usual. We recommend setting the following configurations to optimize GPU consumption:

  • Set train.precision to bf16 to enable automatic mixed precision training. This can reduce your GPU memory usage by 50%.

  • Set train.activation_checkpoint to True to enable activation checkpointing. By recomputing the activations instead of caching them into memory, the memory usage can be improved.

  • Set train.distributed_strategy to fsdp to enabled Fully Sharded Data Parallel training. This shares gradient calculations across different processes to help reduce GPU memory.

  • Try using more lightweight backbones like swin_tiny_224_1k or freeze the backbone through setting model.train_backbone to False.

  • Try changing the augmentation resolution in dataset.augmentation depending on your dataset.

Optimize CPU Memory

To speed up data loading, typically you set a high number of workers to spawn multiple processes. However, this can cause your CPU memory to become Out of Memory, if the size of your annotation file is very large. We recommend setting the following configurations to optimize CPU consumption:

  • Set dataset.dataset_type to serialized so that the COCO-based annotation data can be shared across different subprocesses.

  • Set dataset.augmentation.fixed_padding to True so that images are padded before the batch formulation. Due to random resize and random crop augmentation during training, the resulting image resolution after transform can vary across images. Such variable image resolutions can cause memory leak and the CPU memory to slowly stacks up until it becomes Out of Memory in the middle of training. This is the limitation of PyTorch so we advise setting fixed_padding to True to help stablize the CPU memory usage.

evaluate

The evaluate parameter defines the hyperparameters of the evaluate process.

Copy
Copied!
            

evaluate: checkpoint: /path/to/model.pth conf_threshold: 0.0 num_gpus: 1

Field

value_type

Description

default_value

valid_min

valid_max

valid_options

automl_enabled

num_gpus int 1 FALSE
gpu_ids list [0] FALSE
num_nodes int 1 FALSE
checkpoint string ??? FALSE
results_dir string FALSE
input_width int Width of the input image tensor. 1 FALSE
input_height int Height of the input image tensor. 1 FALSE

trt_engine

string

Path to the TensorRT engine to be used for evaluation.
This only works with tao-deploy.

FALSE

conf_threshold

float

The value of the confidence threshold to be used when
filtering out the final list of boxes.

0.0

FALSE

To run evaluation with a Grounding DINO model, use this command:

Copy
Copied!
            

tao model grounding_dino evaluate [-h] -e <experiment_spec> \ evaluate.checkpoint=<model to be evaluated>

Required Arguments

  • -e, --experiment_spec: The experiment spec file to set up the evaluation experiment.

Optional Arguments

  • evaluate.checkpoint: The .pth model to be evaluated.

Sample Usage

The following is an example of using the evaluate command:

Copy
Copied!
            

tao model grounding_dino evaluate -e /path/to/spec.yaml evaluate.checkpoint=/path/to/model.pth

inference

The inference parameter defines the hyperparameters of the inference process.

Copy
Copied!
            

inference: checkpoint: /path/to/model.pth conf_threshold: 0.5 num_gpus: 1 color_map: "blackcat": red car: blue dataset: infer_data_sources: image_dir: /data/raw-data/val2017/ captions: ["blackcat", "cat"]

Field

value_type

Description

default_value

valid_min

valid_max

valid_options

automl_enabled

num_gpus int 1 FALSE
gpu_ids list [0] FALSE
num_nodes int 1 FALSE
checkpoint string ??? FALSE
results_dir string FALSE

trt_engine

string

Path to the TensorRT engine to be used for evaluation.
This only works with tao-deploy.

FALSE

color_map collection Class-wise dictionary with colors to render boxes. FALSE

conf_threshold

float

The value of the confidence threshold to be used when
filtering out the final list of boxes.

0.5

FALSE

is_internal bool Flag to render with internal directory structure. False FALSE
input_width int Width of the input image tensor. 960 32 FALSE
input_height int Height of the input image tensor. 544 32 FALSE
outline_width int Width in pixels of the bounding box outline. 3 1 FALSE

The inference tool for Grounding DINO models can be used to visualize bboxes and generate frame-by- frame KITTI format labels on a directory of images.

Copy
Copied!
            

tao model grounding_dino inference [-h] -e <experiment spec file> inference.checkpoint=<model to be inferenced>

Required Arguments

  • -e, --experiment_spec: The experiment spec file to set up the inference experiment.

Optional Arguments

  • inference.checkpoint: The .pth model to inference.

Sample Usage

The following is an example of using the inference command:

Copy
Copied!
            

tao model grounding_dino inference -e /path/to/spec.yaml inference.checkpoint=/path/to/model.pth

export

The export parameter defines the hyperparameters of the export process.

Copy
Copied!
            

export: checkpoint: /path/to/model.pth onnx_file: /path/to/model.onnx on_cpu: False opset_version: 17 input_channel: 3 input_width: 960 input_height: 544 batch_size: -1

Field

value_type

Description

default_value

valid_min

valid_max

valid_options

automl_enabled

results_dir string Path to where all the assets generated from a task are stored. FALSE
gpu_id int The index of the GPU to build the TensorRT engine. 0 FALSE
checkpoint string Path to the checkpoint file to run export. ??? FALSE
onnx_file string Path to the onnx model file. ??? FALSE
on_cpu bool Flag to export CPU compatible model. False FALSE
input_channel int Number of channels in the input Tensor. 3 3 FALSE
input_width int Width of the input image tensor. 960 32 FALSE
input_height int Height of the input image tensor. 544 32 FALSE
opset_version int
Operator set version of the ONNX model used to generate
the TensorRT engine.
17 1 FALSE
batch_size int
The batch size of the input Tensor for the engine.
A value of -1 implies dynamic tensor shapes.
-1 -1 FALSE
verbose bool Flag to enable verbose TensorRT logging. False FALSE
Copy
Copied!
            

tao model grounding_dino export [-h] -e <experiment spec file> export.checkpoint=<model to export> export.onnx_file=<onnx path>

Required Arguments

  • -e, --experiment_spec: The path to an experiment spec file.

Optional Arguments

  • export.checkpoint: The .pth model to export.

  • export.onnx_file: The path where the .onnx model is saved.

Sample Usage

The following is an example of using the export command:

Copy
Copied!
            

tao model grounding_dino export -e /path/to/spec.yaml export.checkpoint=/path/to/model.pth export.onnx_file=/path/to/model.onnx

Previous Object Detection
Next DINO
© Copyright 2024, NVIDIA. Last updated on Aug 30, 2024.