Grounding DINO
Grounding DINO is an open vocabulary object-detection model included in the TAO. Through joint training of text and image data, Grounding DINO is able to accept wide range of text data as input and output the corresponding bounding boxes.
It supports the following tasks:
train
evaluate
inference
export
These tasks can be invoked from the TAO Launcher using the following convention on the command-line:
tao model grounding_dino <sub_task> <args_per_subtask>
where, args_per_subtask
are the command-line arguments required for a given subtask. Each
subtask is explained in detail in the following sections.
Grounding DINO expects directories of images for training files to be under ODVG format with JSONL and validation to be annotated JSON files in COCO format.
Unlike other object detection networks in TAO, the category_id
from your COCO JSON file for Grounding DINO
should start from 0 and every category id must be contiguous. Meaning the category can range from 0 to num_classes - 1
.
Because the original COCO annotation does not have a contiguous category id, see the TAO Data Service
tao dataset annotations convert
.
The training experiment spec file for Grounding DINO includes model
, train
, and dataset
parameters.
The following is an example spec file for finetuning a Grounding DINO model with a swin_tiny_224_1k
backbone on a COCO dataset:
dataset:
train_data_sources:
- image_dir: /path/to/coco/train2017/
json_file: /path/to/coco/annotations/instances_train2017.jsonl # odvg format
label_map: /path/to/coco/annotations/instances_train2017_labelmap.json
val_data_sources:
- image_dir: /path/to/coco/val2017/
json_file: /path/to/coco/annotations/instances_val2017_contiguous.json # category ids need to be contiguous
max_labels: 80 # Max number of postive + negative labels passed to the text encoder
batch_size: 4
workers: 8
dataset_type: serialized # To reduce the system memory usage
augmentation:
scales: [480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800]
input_mean: [0.485, 0.456, 0.406]
input_std: [0.229, 0.224, 0.225]
horizontal_flip_prob: 0.5
train_random_resize: [400, 500, 600]
train_random_crop_min: 384
train_random_crop_max: 600
random_resize_max_size: 1333
test_random_resize: 800
model:
backbone: swin_tiny_224_1k
train_backbone: True
num_feature_levels: 4
dec_layers: 6
enc_layers: 6
num_queries: 300
num_queries: 900
dropout_ratio: 0.0
dim_feedforward: 2048
log_scale: auto
class_embed_bias: True # Adding bias in the contrastive embedding layer for training stability
train:
optim:
lr_backbone: 2e-5
lr: 2e-4
lr_steps: [10, 20]
num_epochs: 30
freeze: ["backbone.0", "bert"] # if only finetuning
pretrained_model_path: /path/to/your-gdino-pretrained-model # if only finetuning
precision: bf16 # for efficient training
Field |
value_type |
Description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
encryption_key |
string | FALSE | |||||
results_dir |
string | /results | FALSE | ||||
wandb |
collection | FALSE | |||||
model |
collection | Configurable parameters to construct the model for a Grounding DINO experiment. | FALSE | ||||
dataset |
collection | Configurable parameters to construct the dataset for a Grounding DINO experiment. | FALSE | ||||
train |
collection | Configurable parameters to construct the trainer for a Grounding DINO experiment. | FALSE | ||||
evaluate |
collection | Configurable parameters to construct the evaluator for a Grounding DINO experiment. | FALSE | ||||
inference |
collection | Configurable parameters to construct the inferencer for a Grounding DINO experiment. | FALSE | ||||
export |
collection | Configurable parameters to construct the exporter for a Grounding DINO experiment. | FALSE | ||||
gen_trt_engine |
collection | Configurable parameters to construct the TensorRT engine builder for a Grounding DINO experiment. | FALSE |
model
The model
parameter provides options to change the Grounding DINO architecture.
model:
pretrained_model_path: /path/to/your-gdino-pretrained-model
backbone: swin_tiny_224_1k
train_backbone: True
num_feature_levels: 4
dec_layers: 6
enc_layers: 6
num_queries: 300
num_queries: 900
dropout_ratio: 0.0
dim_feedforward: 2048
log_scale: auto
class_embed_bias: True
Field |
value_type |
Description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
pretrained_backbone_path |
string | [Optional] Path to a pretrained backbone file. | FALSE | ||||
|
string |
The backbone name of the model. |
swin_tiny_224_1k |
|
|
swin_tiny_224_1k,swin_base_224_22k,swin_base_384_22k,swin_large_224_22k,swin_large_384_22k |
FALSE |
num_queries |
int | The number of queries | 900 | 1 | inf | TRUE | |
num_feature_levels |
int | The number of feature levels to use in the model | 4 | 1 | 5 | FALSE | |
set_cost_class |
float | The relative weight of the classification error in the matching cost. | 1.0 | 0.0 | inf | FALSE | |
set_cost_bbox |
float | The relative weight of the L1 error of the bounding box coordinates in the matching cost. | 5.0 | 0.0 | inf | FALSE | |
set_cost_giou |
float | The relative weight of the GIoU loss of the bounding box in the matching cost. | 2.0 | 0.0 | inf | FALSE | |
cls_loss_coef |
float | The relative weight of the classification error in the final loss. | 2.0 | 0.0 | inf | FALSE | |
bbox_loss_coef |
float | The relative weight of the L1 error of the bounding box coordinates in the final loss. | 5.0 | 0.0 | inf | FALSE | |
giou_loss_coef |
float | The relative weight of the GIoU loss of the bounding box in the final loss. | 2.0 | 0.0 | inf | FALSE | |
num_select |
int | The number of top-K predictions selected during post-process | 300 | 1 | TRUE | ||
interm_loss_coef |
float | 1.0 | FALSE | ||||
no_interm_box_loss |
bool | No intermediate bbox loss. | False | FALSE | |||
pre_norm |
bool | Flag to add layer norm in the encoder or not. | False | FALSE | |||
two_stage_type |
string | Type of two stage in DINO | standard | standard,no | FALSE | ||
decoder_sa_type |
string | Type of decoder self attention. | sa | sa,ca_label,ca_content | FALSE | ||
embed_init_tgt |
bool | Flag to add target embedding | True | FALSE | |||
|
int |
If this value is -1, width and height are learned seperately for each box. |
-1 |
-2 |
inf |
|
FALSE |
pe_temperatureH |
int | The temperature applied to the height dimension of the positional sine embedding. | 20 | 1 | inf | FALSE | |
pe_temperatureW |
int | The temperature applied to the width dimension of the positional sine embedding. | 20 | 1 | inf | FALSE | |
return_interm_indices |
list | The index of feature levels to use in the model. The length must match num_feature_levels. | [1, 2, 3, 4] | FALSE | |||
use_dn |
bool | A flag specifying whether to enbable contrastive de-noising training in DINO | True | FALSE | |||
dn_number |
int | The number of denoising queries in DINO. | 0 | 0 | inf | FALSE | |
dn_box_noise_scale |
float | The scale of noise applied to boxes during contrastive de-noising. If this value is 0, noise is not applied. | 1.0 | 0.0 | inf | FALSE | |
|
float |
The scale of the noise applied to labels during |
0.5 |
0.0 |
|
|
FALSE |
focal_alpha |
float | The alpha value in the focal loss. | 0.25 | FALSE | |||
focal_gamma |
float | The gamma value in the focal loss. | 2.0 | FALSE | |||
clip_max_norm |
float | 0.1 | FALSE | ||||
nheads |
int | Number of heads | 8 | FALSE | |||
dropout_ratio |
float | The probability to drop hidden units. | 0.0 | 0.0 | 1.0 | FALSE | |
hidden_dim |
int | Dimension of the hidden units. | 256 | FALSE | |||
enc_layers |
int | Numer of encoder layers in the transformer | 6 | 1 | TRUE | ||
dec_layers |
int | Numer of decoder layers in the transformer. | 6 | 1 | TRUE | ||
dim_feedforward |
int | Dimension of the feedforward network. | 2048 | 1 | FALSE | ||
dec_n_points |
int | Number of reference points in the decoder. | 4 | 1 | FALSE | ||
enc_n_points |
int | Number of reference points in the encoder. | 4 | 1 | FALSE | ||
|
bool |
A flag specifying whether to use auxiliary. |
True |
|
|
|
FALSE |
dilation |
bool | A flag specifying whether enable dilation or not in the backbone. | False | FALSE | |||
|
bool |
Flag to set backbone weights as trainable or frozen. |
True |
|
|
|
FALSE |
|
string |
BERT encoder type. If only the name of the type is provided, |
bert-base-uncased |
|
|
|
FALSE |
max_text_len |
int | Maximum text length of BERT. | 256 | 1 | FALSE | ||
class_embed_bias |
bool | Flag to set bias in the contrastive embedding. | False | FALSE | |||
|
string |
[Optional] The initial value of a learnable parameter to multiply with the similarity |
none |
|
|
|
FALSE |
loss_types |
list | Losses to be used during training. | [‘labels’, ‘boxes’] | FALSE | |||
backbone_names |
list | Prefix of the tensor names corresponding to the backbone. | [‘backbone.0’, ‘bert’] | FALSE | |||
linear_proj_names |
list | Linear projection layer names. | [‘reference_points’, ‘sampling_offsets’] | FALSE |
train
The train
parameter defines the hyperparameters of the training process.
train:
optim:
lr: 0.0002
lr_backbone: 0.00002
momentum: 0.9
weight_decay: 0.0001
lr_scheduler: MultiStep
lr_steps: [10, 20]
lr_decay: 0.1
num_epochs: 30
checkpoint_interval: 1
precision: bf16
distributed_strategy: ddp
activation_checkpoint: True
num_gpus: 8
num_nodes: 1
freeze: ["backbone.0", "bert"]
pretrained_model_path: /path/to/pretrained/model
Field |
value_type |
Description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
num_gpus |
int | The number of GPUs to run the train job. | 1 | 1 | FALSE | ||
gpu_ids |
list | List of GPU IDs to run the training on. The length of this list must be equal to the number of gpus in train.num_gpus. | [0] | FALSE | |||
num_nodes |
int | Number of nodes to run the training on. If > 1, then multi-node is enabled. | 1 | FALSE | |||
seed |
int | The seed for the initializer in PyTorch. If < 0, disable fixed seed. | 1234 | -1 | inf | FALSE | |
cudnn |
collection | FALSE | |||||
num_epochs |
int | Number of epochs to run the training. | 10 | 1 | inf | TRUE | |
checkpoint_interval |
int | The interval (in epochs) at which a checkpoint is saved. Helps resume training. | 1 | 1 | FALSE | ||
validation_interval |
int | The interval (in epochs) at which a evaluation is triggered by the validation dataset. | 1 | 1 | FALSE | ||
resume_training_checkpoint_path |
string | Path to the checkpoint to resume training from. | FALSE | ||||
results_dir |
string | Path to where all the assets generated from a task are stored. | FALSE | ||||
|
list |
List of layer names to freeze. |
[] |
|
|
|
FALSE |
pretrained_model_path |
string | Path to a pre-trained Deformable DETR model to initialize the current training from. | FALSE | ||||
|
float |
Amount to clip the gradient by L2 Norm. |
0.1 |
|
|
|
FALSE |
|
bool |
Whether to run the trainer in Dry Run mode. This serves |
False |
|
|
|
FALSE |
optim |
collection | Hyper parameters to configure the optimizer. | FALSE | ||||
precision |
string | Precision to run the training on. | fp32 | fp16,fp32,bf16 | FALSE | ||
|
string |
The multi-GPU training strategy. |
ddp |
|
|
ddp,fsdp |
FALSE |
|
bool |
A True value instructs train to recompute in backward pass to save GPU memory, |
True |
|
|
|
FALSE |
verbose |
bool | Flag to enable printing of detailed learning rate scaling from the optimizer. | False | FALSE |
optim
The optim
parameter defines the config for the optimizer in training, including the
learning rate, learning scheduler, and weight decay.
optim:
lr: 0.0002
lr_backbone: 0.00002
momentum: 0.9
weight_decay: 0.0001
lr_scheduler: MultiStep
lr_steps: [10, 20]
lr_decay: 0.1
Field |
value_type |
Description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
optimizer |
string | Type of optimizer used to train the network. | AdamW | AdamW,SGD | FALSE | ||
monitor_name |
string | The metric value to be monitored for the AutoReduce Scheduler. |
val_loss | val_loss,train_loss | FALSE | ||
lr |
float | The initial learning rate for training the model, excluding the backbone. | 0.0002 | TRUE | |||
lr_backbone |
float | The initial learning rate for training the backbone. | 2e-05 | TRUE | |||
lr_linear_proj_mult |
float | The initial learning rate for training the linear projection layer. | 0.1 | TRUE | |||
momentum |
float | The momentum for the AdamW optimizer. | 0.9 | TRUE | |||
weight_decay |
float | The weight decay coefficient. | 0.0001 | TRUE | |||
|
string |
The learning scheduler: |
MultiStep |
|
|
MultiStep,StepLR |
FALSE |
|
list |
The steps at which the learning rate must be decreased. |
[10] |
|
|
|
FALSE |
lr_step_size |
int | The number of steps to decrease the learning rate in the StepLR. | 10 | TRUE | |||
lr_decay |
float | The decreasing factor for the learning rate scheduler. | 0.1 | TRUE |
dataset
The dataset
parameter defines the dataset source, training batch size, and
augmentation.
dataset:
train_data_sources:
- image_dir: /path/to/coco/train2017/
json_file: /path/to/coco/annotations/instances_train2017.jsonl # odvg format
label_map: /path/to/coco/annotations/instances_train2017_labelmap.json
- image_dir: /path/to/coco/train2017/
json_file: /path/to/coco/annotations/refcoco.jsonl # grounding dataset which doesn't require label_map
val_data_sources:
image_dir: /path/to/coco/val2017/
json_file: /path/to/coco/annotations/instances_val2017_contiguous.json # category ids need to be contiguous
test_data_sources:
image_dir: /path/to/coco/images/val2017/
json_file: /path/to/coco/annotations/instances_val2017.json
infer_data_sources:
- image_dir: /path/to/coco/images/val2017/
captions: ["blackcat", "car"]
max_labels: 80
batch_size: 4
workers: 8
Field |
value_type |
Description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
|
list |
The list of data sources for training: |
[{‘image_dir’: ‘’, ‘json_file’: ‘’, ‘label_map’: ‘’}, {‘image_dir’: ‘’, ‘json_file’: ‘’}] |
|
|
|
FALSE |
|
collection |
The data source for validation: |
{‘image_dir’: ‘’, ‘json_file’: ‘’} |
|
|
|
FALSE |
|
collection |
The data source for testing: |
{‘image_dir’: ‘’, ‘json_file’: ‘’} |
|
|
|
FALSE |
|
collection |
The data source for inference: |
{‘image_dir’: [‘’], ‘captions’: [‘’]} |
|
|
|
FALSE |
batch_size |
int | The batch size for training and validation | 4 | 1 | inf | TRUE | |
workers |
int | The number of parallel workers processing data | 8 | 1 | inf | TRUE | |
|
bool |
Flag to enable the dataloader to allocated pagelocked memory for faster |
True |
|
|
|
FALSE |
|
string |
If set to default, the standard map-style dataset structure |
serialized |
|
|
serialized,default |
FALSE |
|
int |
The total number of labels to sample from. After sampling positive labels, |
50 |
1 |
inf |
|
FALSE |
eval_class_ids |
list | IDs of the classes for evaluation. | [1] | FALSE | |||
augmentation |
collection | Configuration parameters for data augmentation. | FALSE |
augmentation
The augmentation
parameter contains hyperparameters for augmentation.
augmentation:
scales: [480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800]
input_mean: [0.485, 0.456, 0.406]
input_std: [0.229, 0.224, 0.225]
horizontal_flip_prob: 0.5
train_random_resize: [400, 500, 600]
train_random_crop_min: 384
train_random_crop_max: 600
random_resize_max_size: 1333
test_random_resize: 800
Field |
value_type |
Description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
scales |
list | A list of sizes to perform random resize on. | [480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800] | FALSE | |||
input_mean |
list | The input mean for RGB frames | [0.485, 0.456, 0.406] | FALSE | |||
input_std |
list | The input standard deviation per pixel for RGB frames | [0.229, 0.224, 0.225] | FALSE | |||
train_random_resize |
list | A list of sizes to perform random resize for training data | [400, 500, 600] | FALSE | |||
horizontal_flip_prob |
float | The probability for horizonal flip during training | 0.5 | 0.0 | 1.0 | TRUE | |
train_random_crop_min |
int | The minimum random crop size for training data | 384 | 1 | inf | TRUE | |
train_random_crop_max |
int | The maximum random crop size for training data | 600 | 1 | inf | TRUE | |
random_resize_max_size |
int | The maximum random resize size for training data | 1333 | 1 | inf | TRUE | |
test_random_resize |
int | The random resize size for test data | 800 | 1 | inf | TRUE | |
|
bool |
A flag specifying whether to resize the image (with no padding) to |
TRUE |
|
|
|
FALSE |
|
int |
A flag to enable Large Scale Jittering, which is used for ViT backbones. |
1024 |
1 |
inf |
|
FALSE |
To train a Grounding DINO model, use this command:
tao model grounding_dino train [-h] -e <experiment_spec>
Required Arguments
-e, --experiment_spec
: The experiment specification file to set up the training experiment.
Optional Arguments
-h, --help
: Show this help message and exit.
Sample Usage
The following is an example of the train
command:
tao grounding_dino model train -e /path/to/spec.yaml
Optimizing Resource for Training Grounding DINO
Training Grounding DINO requires strong GPUs (example: V100/A100) with at least 15GB of VRAM and a lot of CPU memory to be trained on a standard dataset like COCO. This section outlines some of the strategies you can use to launch training with limited resources.
Optimize GPU Memory
There are various ways to optimize GPU memory usage. One trick is to reduce dataset.batch_size
. However, this can cause your training to take longer than usual.
We recommend setting the following configurations to optimize GPU consumption:
Set
train.precision
tobf16
to enable automatic mixed precision training. This can reduce your GPU memory usage by 50%.Set
train.activation_checkpoint
toTrue
to enable activation checkpointing. By recomputing the activations instead of caching them into memory, the memory usage can be improved.Set
train.distributed_strategy
tofsdp
to enabled Fully Sharded Data Parallel training. This shares gradient calculations across different processes to help reduce GPU memory.Try using more lightweight backbones like
swin_tiny_224_1k
or freeze the backbone through settingmodel.train_backbone
to False.Try changing the augmentation resolution in
dataset.augmentation
depending on your dataset.
Optimize CPU Memory
To speed up data loading, typically you set a high number of workers to spawn multiple processes. However, this can cause your CPU memory to become Out of Memory, if the size of your annotation file is very large. We recommend setting the following configurations to optimize CPU consumption:
Set
dataset.dataset_type
toserialized
so that the COCO-based annotation data can be shared across different subprocesses.Set
dataset.augmentation.fixed_padding
to True so that images are padded before the batch formulation. Due to random resize and random crop augmentation during training, the resulting image resolution after transform can vary across images. Such variable image resolutions can cause memory leak and the CPU memory to slowly stacks up until it becomes Out of Memory in the middle of training. This is the limitation of PyTorch so we advise settingfixed_padding
to True to help stablize the CPU memory usage.
evaluate
The evaluate
parameter defines the hyperparameters of the evaluate process.
evaluate:
checkpoint: /path/to/model.pth
conf_threshold: 0.0
num_gpus: 1
Field |
value_type |
Description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
num_gpus |
int | 1 | FALSE | ||||
gpu_ids |
list | [0] | FALSE | ||||
num_nodes |
int | 1 | FALSE | ||||
checkpoint |
string | ??? | FALSE | ||||
results_dir |
string | FALSE | |||||
input_width |
int | Width of the input image tensor. | 1 | FALSE | |||
input_height |
int | Height of the input image tensor. | 1 | FALSE | |||
|
string |
Path to the TensorRT engine to be used for evaluation. |
|
|
|
|
FALSE |
|
float |
The value of the confidence threshold to be used when |
0.0 |
|
|
|
FALSE |
To run evaluation with a Grounding DINO model, use this command:
tao model grounding_dino evaluate [-h] -e <experiment_spec> \
evaluate.checkpoint=<model to be evaluated>
Required Arguments
-e, --experiment_spec
: The experiment spec file to set up the evaluation experiment.
Optional Arguments
evaluate.checkpoint
: The.pth
model to be evaluated.
Sample Usage
The following is an example of using the evaluate
command:
tao model grounding_dino evaluate -e /path/to/spec.yaml evaluate.checkpoint=/path/to/model.pth
inference
The inference
parameter defines the hyperparameters of the inference process.
inference:
checkpoint: /path/to/model.pth
conf_threshold: 0.5
num_gpus: 1
color_map:
"blackcat": red
car: blue
dataset:
infer_data_sources:
image_dir: /data/raw-data/val2017/
captions: ["blackcat", "cat"]
Field |
value_type |
Description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
num_gpus |
int | 1 | FALSE | ||||
gpu_ids |
list | [0] | FALSE | ||||
num_nodes |
int | 1 | FALSE | ||||
checkpoint |
string | ??? | FALSE | ||||
results_dir |
string | FALSE | |||||
|
string |
Path to the TensorRT engine to be used for evaluation. |
|
|
|
|
FALSE |
color_map |
collection | Class-wise dictionary with colors to render boxes. | FALSE | ||||
|
float |
The value of the confidence threshold to be used when |
0.5 |
|
|
|
FALSE |
is_internal |
bool | Flag to render with internal directory structure. | False | FALSE | |||
input_width |
int | Width of the input image tensor. | 960 | 32 | FALSE | ||
input_height |
int | Height of the input image tensor. | 544 | 32 | FALSE | ||
outline_width |
int | Width in pixels of the bounding box outline. | 3 | 1 | FALSE |
The inference tool for Grounding DINO models can be used to visualize bboxes and generate frame-by- frame KITTI format labels on a directory of images.
tao model grounding_dino inference [-h] -e <experiment spec file>
inference.checkpoint=<model to be inferenced>
Required Arguments
-e, --experiment_spec
: The experiment spec file to set up the inference experiment.
Optional Arguments
inference.checkpoint
: The.pth
model to inference.
Sample Usage
The following is an example of using the inference
command:
tao model grounding_dino inference -e /path/to/spec.yaml inference.checkpoint=/path/to/model.pth
export
The export
parameter defines the hyperparameters of the export process.
export:
checkpoint: /path/to/model.pth
onnx_file: /path/to/model.onnx
on_cpu: False
opset_version: 17
input_channel: 3
input_width: 960
input_height: 544
batch_size: -1
Field |
value_type |
Description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
results_dir |
string | Path to where all the assets generated from a task are stored. | FALSE | ||||
gpu_id |
int | The index of the GPU to build the TensorRT engine. | 0 | FALSE | |||
checkpoint |
string | Path to the checkpoint file to run export. | ??? | FALSE | |||
onnx_file |
string | Path to the onnx model file. | ??? | FALSE | |||
on_cpu |
bool | Flag to export CPU compatible model. | False | FALSE | |||
input_channel |
int | Number of channels in the input Tensor. | 3 | 3 | FALSE | ||
input_width |
int | Width of the input image tensor. | 960 | 32 | FALSE | ||
input_height |
int | Height of the input image tensor. | 544 | 32 | FALSE | ||
opset_version |
int |
|
17 | 1 | FALSE | ||
batch_size |
int |
|
-1 | -1 | FALSE | ||
verbose |
bool | Flag to enable verbose TensorRT logging. | False | FALSE |
tao model grounding_dino export [-h] -e <experiment spec file>
export.checkpoint=<model to export>
export.onnx_file=<onnx path>
Required Arguments
-e, --experiment_spec
: The path to an experiment spec file.
Optional Arguments
export.checkpoint
: The.pth
model to export.export.onnx_file
: The path where the.onnx
model is saved.
Sample Usage
The following is an example of using the export
command:
tao model grounding_dino export -e /path/to/spec.yaml export.checkpoint=/path/to/model.pth export.onnx_file=/path/to/model.onnx
For deployment, refer to TAO Deploy documentation for Grounding DINO.