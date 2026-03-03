Grounding DINO#

Grounding DINO is an open vocabulary object-detection model included in the TAO. Through joint training of text and image data, Grounding DINO is able to accept wide range of text data as input and output the corresponding bounding boxes.

It supports the following tasks:

train

evaluate

inference

export

Each task is explained in detail in the following sections.

The spec format is YAML for TAO Launcher, and JSON for FTMS Client.

File-related parameters, such as dataset paths or pretrained model paths, are required only for TAO Launcher, not for FTMS Client.

Data Input for Grounding DINO# Grounding DINO expects directories of images for training files to be under ODVG format with JSONL and validation to be annotated JSON files in COCO format. Note Unlike other object detection networks in TAO, the category_id from your COCO JSON file for Grounding DINO should start from 0 and every category id must be contiguous. Meaning the category can range from 0 to num_classes - 1 . Because the original COCO annotation does not have a contiguous category id, see the TAO Data Service tao dataset annotations convert .

Creating an Experiment Spec File# The training experiment spec file for Grounding DINO includes model , train , and dataset parameters. The following is an example spec file for finetuning a Grounding DINO model with a swin_tiny_224_1k backbone on a COCO dataset: TAO Client (v2 API) Use the following command to get an experiment spec file for Grounding DINO: BASE_EXPERIMENT_ID = $( tao grounding_dino list-base-experiments | jq -r '.[0].id' ) SPECS = $( tao grounding_dino get-job-schema --action train --base-experiment-id $BASE_EXPERIMENT_ID | jq -r '.default' ) TAO Launcher dataset : train_data_sources : - image_dir : /path/to/coco/train2017/ json_file : /path/to/coco/annotations/instances_train2017.jsonl # odvg format label_map : /path/to/coco/annotations/instances_train2017_labelmap.json val_data_sources : - image_dir : /path/to/coco/val2017/ json_file : /path/to/coco/annotations/instances_val2017_contiguous.json # category ids need to be contiguous max_labels : 80 # Max number of postive + negative labels passed to the text encoder batch_size : 4 workers : 8 dataset_type : serialized # To reduce the system memory usage augmentation : scales : [ 480 , 512 , 544 , 576 , 608 , 640 , 672 , 704 , 736 , 768 , 800 ] input_mean : [ 0.485 , 0.456 , 0.406 ] input_std : [ 0.229 , 0.224 , 0.225 ] horizontal_flip_prob : 0.5 train_random_resize : [ 400 , 500 , 600 ] train_random_crop_min : 384 train_random_crop_max : 600 random_resize_max_size : 1333 test_random_resize : 800 model : backbone : swin_tiny_224_1k train_backbone : True num_feature_levels : 4 dec_layers : 6 enc_layers : 6 num_queries : 300 num_queries : 900 dropout_ratio : 0.0 dim_feedforward : 2048 log_scale : auto class_embed_bias : True # Adding bias in the contrastive embedding layer for training stability train : optim : lr_backbone : 2e-5 lr : 2e-4 lr_steps : [ 10 , 20 ] num_epochs : 30 freeze : [ "backbone.0" , "bert" ] # if only finetuning pretrained_model_path : /path/to/your-gdino-pretrained-model # if only finetuning precision : bf16 # for efficient training Field value_type Description default_value valid_min valid_max valid_options automl_enabled encryption_key string FALSE results_dir string /results FALSE wandb collection FALSE model collection Configurable parameters to construct the model for a Grounding DINO experiment. FALSE dataset collection Configurable parameters to construct the dataset for a Grounding DINO experiment. FALSE train collection Configurable parameters to construct the trainer for a Grounding DINO experiment. FALSE evaluate collection Configurable parameters to construct the evaluator for a Grounding DINO experiment. FALSE inference collection Configurable parameters to construct the inferencer for a Grounding DINO experiment. FALSE export collection Configurable parameters to construct the exporter for a Grounding DINO experiment. FALSE gen_trt_engine collection Configurable parameters to construct the TensorRT engine builder for a Grounding DINO experiment. FALSE model# The model parameter provides options to change the Grounding DINO architecture. model : pretrained_model_path : /path/to/your-gdino-pretrained-model backbone : swin_tiny_224_1k train_backbone : True num_feature_levels : 4 dec_layers : 6 enc_layers : 6 num_queries : 300 num_queries : 900 dropout_ratio : 0.0 dim_feedforward : 2048 log_scale : auto class_embed_bias : True Field value_type Description default_value valid_min valid_max valid_options automl_enabled pretrained_backbone_path string [Optional] Path to a pretrained backbone file. FALSE backbone

string

The backbone name of the model. TAO implementation of Groudning DINO support Swin. swin_tiny_224_1k









swin_tiny_224_1k,swin_base_224_22k,swin_base_384_22k,swin_large_224_22k,swin_large_384_22k

FALSE

num_queries int The number of queries 900 1 inf TRUE num_feature_levels int The number of feature levels to use in the model 4 1 5 FALSE set_cost_class float The relative weight of the classification error in the matching cost. 1.0 0.0 inf FALSE set_cost_bbox float The relative weight of the L1 error of the bounding box coordinates in the matching cost. 5.0 0.0 inf FALSE set_cost_giou float The relative weight of the GIoU loss of the bounding box in the matching cost. 2.0 0.0 inf FALSE cls_loss_coef float The relative weight of the classification error in the final loss. 2.0 0.0 inf FALSE bbox_loss_coef float The relative weight of the L1 error of the bounding box coordinates in the final loss. 5.0 0.0 inf FALSE giou_loss_coef float The relative weight of the GIoU loss of the bounding box in the final loss. 2.0 0.0 inf FALSE num_select int The number of top-K predictions selected during post-process 300 1 TRUE interm_loss_coef float 1.0 FALSE no_interm_box_loss bool No intermediate bbox loss. False FALSE pre_norm bool Flag to add layer norm in the encoder or not. False FALSE two_stage_type string Type of two stage in DINO standard standard,no FALSE decoder_sa_type string Type of decoder self attention. sa sa,ca_label,ca_content FALSE embed_init_tgt bool Flag to add target embedding True FALSE fix_refpoints_hw



int



If this value is -1, width and height are learned seperately for each box. If this value is -2, a shared width and height are learned. A value greater than 0 specifies learning with a fixed number. -1



-2



inf









FALSE



pe_temperatureH int The temperature applied to the height dimension of the positional sine embedding. 20 1 inf FALSE pe_temperatureW int The temperature applied to the width dimension of the positional sine embedding. 20 1 inf FALSE return_interm_indices list The index of feature levels to use in the model. The length must match num_feature_levels . [1, 2, 3, 4] FALSE use_dn bool A flag specifying whether to enbable contrastive de-noising training in DINO True FALSE dn_number int The number of denoising queries in DINO. 0 0 inf FALSE dn_box_noise_scale float The scale of noise applied to boxes during contrastive de-noising. If this value is 0, noise is not applied. 1.0 0.0 inf FALSE dn_label_noise_ratio



float



The scale of the noise applied to labels during contrastive denoising. If this value is 0, then noise is no applied. 0.5



0.0















FALSE



focal_alpha float The alpha value in the focal loss. 0.25 FALSE focal_gamma float The gamma value in the focal loss. 2.0 FALSE clip_max_norm float 0.1 FALSE nheads int Number of heads 8 FALSE dropout_ratio float The probability to drop hidden units. 0.0 0.0 1.0 FALSE hidden_dim int Dimension of the hidden units. 256 FALSE enc_layers int Numer of encoder layers in the transformer 6 1 TRUE dec_layers int Numer of decoder layers in the transformer. 6 1 TRUE dim_feedforward int Dimension of the feedforward network. 2048 1 FALSE dec_n_points int Number of reference points in the decoder. 4 1 FALSE enc_n_points int Number of reference points in the encoder. 4 1 FALSE aux_loss

bool

A flag specifying whether to use auxiliary. decoding losses (loss at each decoder layer) True













FALSE

dilation bool A flag specifying whether enable dilation or not in the backbone. False FALSE train_backbone

bool

Flag to set backbone weights as trainable or frozen. When set to False , the backbone weights are frozen. True













FALSE

text_encoder_type



string



BERT encoder type. If only the name of the type is provided, the weight is download from the Hugging Face Hub. If a path is provided, then we load the weight from the local path. bert-base-uncased





















FALSE



max_text_len int Maximum text length of BERT. 256 1 FALSE class_embed_bias bool Flag to set bias in the contrastive embedding. False FALSE log_scale







string







[Optional] The initial value of a learnable parameter to multiply with the similarity matrix to normalize the output. Defaults to None. - If set to ‘auto’, the similarity matrix is normalized by a fixed value sqrt(d_c) where d_c is the channel number. - If set to ‘none’ or None , there is no normalization applied. none





































FALSE







loss_types list Losses to be used during training. [‘labels’, ‘boxes’] FALSE backbone_names list Prefix of the tensor names corresponding to the backbone. [‘backbone.0’, ‘bert’] FALSE linear_proj_names list Linear projection layer names. [‘reference_points’, ‘sampling_offsets’] FALSE train# The train parameter defines the hyperparameters of the training process. train : optim : lr : 0.0002 lr_backbone : 0.00002 momentum : 0.9 weight_decay : 0.0001 lr_scheduler : MultiStep lr_steps : [ 10 , 20 ] lr_decay : 0.1 num_epochs : 30 checkpoint_interval : 1 precision : bf16 distributed_strategy : ddp activation_checkpoint : True num_gpus : 8 num_nodes : 1 freeze : [ "backbone.0" , "bert" ] pretrained_model_path : /path/to/pretrained/model Field value_type Description default_value valid_min valid_max valid_options automl_enabled num_gpus int The number of GPUs to run the train job. 1 1 FALSE gpu_ids list List of GPU IDs to run the training on. The length of this list must be equal to the number of gpus in train.num_gpus. [0] FALSE num_nodes int Number of nodes to run the training on. If > 1, then multi-node is enabled. 1 FALSE seed int The seed for the initializer in PyTorch. If < 0, disable fixed seed. 1234 -1 inf FALSE cudnn collection FALSE num_epochs int Number of epochs to run the training. 10 1 inf TRUE checkpoint_interval int The interval (in epochs) at which a checkpoint is saved. Helps resume training. 1 1 FALSE validation_interval int The interval (in epochs) at which a evaluation is triggered by the validation dataset. 1 1 FALSE resume_training_checkpoint_path string Path to the checkpoint to resume training from. FALSE results_dir string Path to where all the assets generated from a task are stored. FALSE freeze

list

List of layer names to freeze. Example: [“backbone”, “transformer.encoder”, “input_proj”]. []













FALSE

pretrained_model_path string Path to a pre-trained Deformable DETR model to initialize the current training from. FALSE clip_grad_norm

float

Amount to clip the gradient by L2 Norm. A value of 0.0 specifies no clipping. 0.1













FALSE

is_dry_run



bool



Whether to run the trainer in Dry Run mode. This serves as a good means to validate the spec file and run a sanity check on the trainer without actually initializing and running the trainer. False





















FALSE



optim collection Hyper parameters to configure the optimizer. FALSE precision string Precision to run the training on. fp32 fp16,fp32,bf16 FALSE distributed_strategy

string

The multi-GPU training strategy. DDP (Distributed Data Parallel) and Fully Sharded DDP are supported. ddp









ddp,fsdp

FALSE

activation_checkpoint

bool

A True value instructs train to recompute in backward pass to save GPU memory, rather than storing activations. True













FALSE

verbose bool Flag to enable printing of detailed learning rate scaling from the optimizer. False FALSE optim# The optim parameter defines the config for the optimizer in training, including the learning rate, learning scheduler, and weight decay. optim : lr : 0.0002 lr_backbone : 0.00002 momentum : 0.9 weight_decay : 0.0001 lr_scheduler : MultiStep lr_steps : [ 10 , 20 ] lr_decay : 0.1 Field value_type Description default_value valid_min valid_max valid_options automl_enabled optimizer string Type of optimizer used to train the network. AdamW AdamW,SGD FALSE monitor_name string The metric value to be monitored for the AutoReduce Scheduler. val_loss val_loss,train_loss FALSE lr float The initial learning rate for training the model, excluding the backbone. 0.0002 TRUE lr_backbone float The initial learning rate for training the backbone. 2e-05 TRUE lr_linear_proj_mult float The initial learning rate for training the linear projection layer. 0.1 TRUE momentum float The momentum for the AdamW optimizer. 0.9 TRUE weight_decay float The weight decay coefficient. 0.0001 TRUE lr_scheduler



string



The learning scheduler: * MultiStep : Decrease the lr by lr_decay from lr_steps * StepLR : Decrease the lr by lr_decay at every lr_step_size. MultiStep















MultiStep,StepLR



FALSE



lr_steps

list

The steps at which the learning rate must be decreased. This is applicable only with the MultiStep LR. [10]













FALSE

lr_step_size int The number of steps to decrease the learning rate in the StepLR. 10 TRUE lr_decay float The decreasing factor for the learning rate scheduler. 0.1 TRUE dataset# The dataset parameter defines the dataset source, training batch size, and augmentation. dataset : train_data_sources : - image_dir : /path/to/coco/train2017/ json_file : /path/to/coco/annotations/instances_train2017.jsonl # odvg format label_map : /path/to/coco/annotations/instances_train2017_labelmap.json - image_dir : /path/to/coco/train2017/ json_file : /path/to/coco/annotations/refcoco.jsonl # grounding dataset which doesn't require label_map val_data_sources : image_dir : /path/to/coco/val2017/ json_file : /path/to/coco/annotations/instances_val2017_contiguous.json # category ids need to be contiguous test_data_sources : image_dir : /path/to/coco/images/val2017/ json_file : /path/to/coco/annotations/instances_val2017.json infer_data_sources : - image_dir : /path/to/coco/images/val2017/ captions : [ "black cat" , "car" ] max_labels : 80 batch_size : 4 workers : 8 Field value_type Description default_value valid_min valid_max valid_options automl_enabled train_data_sources





list





The list of data sources for training: * image_dir : The directory that contains the training images * json_file : The path of the JSONL file, which uses training-annotation ODVG format * label_map: (Optional) The path of the label mapping only required for detection dataset [{‘image_dir’: ‘’, ‘json_file’: ‘’, ‘label_map’: ‘’}, {‘image_dir’: ‘’, ‘json_file’: ‘’}]





























FALSE





val_data_sources







collection







The data source for validation: * image_dir : The directory that contains the validation images * json_file : The path of the JSON file, which uses validation-annotation COCO format. Note: category id must start from 0 if to calculate validation loss. Run Data Services annotation convert to making the categories contiguous. {‘image_dir’: ‘’, ‘json_file’: ‘’}





































FALSE







test_data_sources



collection



The data source for testing: * image_dir : The directory that contains the test images * json_file : The path of the JSON file, which uses test-annotation COCO format {‘image_dir’: ‘’, ‘json_file’: ‘’}





















FALSE



infer_data_sources



collection



The data source for inference: * image_dir : The list of directories that contains the inference images * captions : The list of caption to run inference {‘image_dir’: [‘’], ‘captions’: [‘’]}





















FALSE



batch_size int The batch size for training and validation 4 1 inf TRUE workers int The number of parallel workers processing data 8 1 inf TRUE pin_memory

bool

Flag to enable the dataloader to allocated pagelocked memory for faster of data between the CPU and GPU. True













FALSE

dataset_type







string







If set to default, the standard map-style dataset structure from torch is followed, which loads ODVG annotation in every subprocess. This leads to a redudant copy of data and can cause RAM to explode if workers is high. If set to serialized, the data is serialized through pickle and torch.Tensor that allows the data to be shared across subprocesses. As a result, RAM usage can be greatly improved. serialized



























serialized,default







FALSE







max_labels







int







The total number of labels to sample from. After sampling positive labels, random negative samples are sampled so that the total number of labels is equal to max_labels . For detection dataset, negative labels are categories not present in the image. For grounding dataset, negative labels are phrases in the original caption not present in the image. Setting higher max_labels may improve robustness of the model with the cost of longer training time. 50







1







inf

















FALSE







eval_class_ids list IDs of the classes for evaluation. [1] FALSE augmentation collection Configuration parameters for data augmentation. FALSE augmentation# The augmentation parameter contains hyperparameters for augmentation. augmentation : scales : [ 480 , 512 , 544 , 576 , 608 , 640 , 672 , 704 , 736 , 768 , 800 ] input_mean : [ 0.485 , 0.456 , 0.406 ] input_std : [ 0.229 , 0.224 , 0.225 ] horizontal_flip_prob : 0.5 train_random_resize : [ 400 , 500 , 600 ] train_random_crop_min : 384 train_random_crop_max : 600 random_resize_max_size : 1333 test_random_resize : 800 Field value_type Description default_value valid_min valid_max valid_options automl_enabled scales list A list of sizes to perform random resize on. [480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800] FALSE input_mean list The input mean for RGB frames [0.485, 0.456, 0.406] FALSE input_std list The input standard deviation per pixel for RGB frames [0.229, 0.224, 0.225] FALSE train_random_resize list A list of sizes to perform random resize for training data [400, 500, 600] FALSE horizontal_flip_prob float The probability for horizonal flip during training 0.5 0.0 1.0 TRUE train_random_crop_min int The minimum random crop size for training data 384 1 inf TRUE train_random_crop_max int The maximum random crop size for training data 600 1 inf TRUE random_resize_max_size int The maximum random resize size for training data 1333 1 inf TRUE test_random_resize int The random resize size for test data 800 1 inf TRUE fixed_padding

bool

A flag specifying whether to resize the image (with no padding) to (sorted(scales[-1]), random_resize_max_size) to prevent a CPU “ memory leak. TRUE













FALSE

fixed_random_crop

int

A flag to enable Large Scale Jittering, which is used for ViT backbones. The resulting image resolution is fixed to fixed_random_crop. 1024

1

inf





FALSE



Training the Model# To train a Grounding DINO model, use this command: TAO Client (v2 API) TRAIN_JOB_ID = $( tao grounding_dino create-job \ --kind experiment \ --name "grounding_dino_train" \ --action train \ --workspace-id $WORKSPACE_ID \ --specs " $TRAIN_SPECS " \ --train-datasets '["' $DATASET_ID '"]' \ --eval-dataset " $DATASET_ID " \ --base-experiment-ids '["' $BASE_EXPERIMENT_ID '"]' \ --encryption-key "nvidia_tlt" | jq -r '.id' ) .. include:: /text/excerpts/multi_node_training_ftms.rst TAO Launcher tao model grounding_dino train [ -h ] -e <experiment_spec> Required Arguments The following arguments are required to run the command. -e, --experiment_spec : The experiment specification file to set up the training experiment. Optional Arguments The following arguments are optional to run the command. -h, --help : Show this help message and exit. Sample Usage The following is an example of the train command: tao grounding_dino model train -e /path/to/spec.yaml Optimizing Resource for Training Grounding DINO# Training Grounding DINO requires strong GPUs (example: V100/A100) with at least 15GB of VRAM and a lot of CPU memory to be trained on a standard dataset like COCO. This section outlines some of the strategies you can use to launch training with limited resources. Optimize GPU Memory# There are various ways to optimize GPU memory usage. One trick is to reduce dataset.batch_size . However, this can cause your training to take longer than usual. We recommend setting the following configurations to optimize GPU consumption: Set train.precision to bf16 to enable automatic mixed precision training. This can reduce your GPU memory usage by 50%.

Set train.activation_checkpoint to True to enable activation checkpointing. By recomputing the activations instead of caching them into memory, the memory usage can be improved.

Set train.distributed_strategy to fsdp to enabled Fully Sharded Data Parallel training. This shares gradient calculations across different processes to help reduce GPU memory.

Try using more lightweight backbones like swin_tiny_224_1k or freeze the backbone through setting model.train_backbone to False.

Try changing the augmentation resolution in dataset.augmentation depending on your dataset. Optimize CPU Memory# To speed up data loading, typically you set a high number of workers to spawn multiple processes. However, this can cause your CPU memory to become Out of Memory, if the size of your annotation file is very large. We recommend setting the following configurations to optimize CPU consumption: Set dataset.dataset_type to serialized so that the COCO-based annotation data can be shared across different subprocesses.

Set dataset.augmentation.fixed_padding to True so that images are padded before the batch formulation. Due to random resize and random crop augmentation during training, the resulting image resolution after transform can vary across images. Such variable image resolutions can cause memory leak and the CPU memory to slowly stacks up until it becomes Out of Memory in the middle of training. This is the limitation of PyTorch so we advise setting fixed_padding to True to help stablize the CPU memory usage.

Evaluating the Model# evaluate# The evaluate parameter defines the hyperparameters of the evaluate process. evaluate : checkpoint : /path/to/model.pth conf_threshold : 0.0 num_gpus : 1 Field value_type Description default_value valid_min valid_max valid_options automl_enabled num_gpus int 1 FALSE gpu_ids list [0] FALSE num_nodes int 1 FALSE checkpoint string ??? FALSE results_dir string FALSE input_width int Width of the input image tensor. 1 FALSE input_height int Height of the input image tensor. 1 FALSE trt_engine

string

Path to the TensorRT engine to be used for evaluation. This only works with tao-deploy .















FALSE

conf_threshold

float

The value of the confidence threshold to be used when filtering out the final list of boxes. 0.0













FALSE

To run evaluation with a Grounding DINO model, use this command: TAO Client (v2 API) EVAL_JOB_ID = $( tao grounding_dino create-job \ --kind experiment \ --name "grounding_dino_evaluate" \ --action evaluate \ --workspace-id $WORKSPACE_ID \ --parent-job-id $TRAIN_JOB_ID \ --eval-dataset " $DATASET_ID " \ --specs " $EVALUATE_SPECS " \ --base-experiment-ids '["' $BASE_EXPERIMENT_ID '"]' \ --encryption-key "nvidia_tlt" | jq -r '.id' ) TAO Launcher tao model grounding_dino evaluate [ -h ] -e <experiment_spec> \ evaluate.checkpoint = <model to be evaluated> Required Arguments The following arguments are required. -e, --experiment_spec : The experiment spec file to set up the evaluation experiment. Optional Arguments The following arguments are optional to run the command. evaluate.checkpoint : The .pth model to be evaluated. Sample Usage The following is an example of using the evaluate command: tao model grounding_dino evaluate -e /path/to/spec.yaml evaluate.checkpoint = /path/to/model.pth

Running Inference with a Grounding Model# inference# The inference parameter defines the hyperparameters of the inference process. inference : checkpoint : /path/to/model.pth conf_threshold : 0.5 num_gpus : 1 color_map : "black cat" : red car : blue dataset : infer_data_sources : image_dir : /data/raw-data/val2017/ captions : [ "black cat" , "cat" ] Field value_type Description default_value valid_min valid_max valid_options automl_enabled num_gpus int 1 FALSE gpu_ids list [0] FALSE num_nodes int 1 FALSE checkpoint string ??? FALSE results_dir string FALSE trt_engine

string

Path to the TensorRT engine to be used for evaluation. This only works with tao-deploy .















FALSE

color_map collection Class-wise dictionary with colors to render boxes. FALSE conf_threshold

float

The value of the confidence threshold to be used when filtering out the final list of boxes. 0.5













FALSE

is_internal bool Flag to render with internal directory structure. False FALSE input_width int Width of the input image tensor. 960 32 FALSE input_height int Height of the input image tensor. 544 32 FALSE outline_width int Width in pixels of the bounding box outline. 3 1 FALSE The inference tool for Grounding DINO models can be used to visualize bboxes and generate frame-by- frame KITTI format labels on a directory of images. TAO Client (v2 API) INFER_JOB_ID = $( tao grounding_dino create-job \ --kind experiment \ --name "grounding_dino_inference" \ --action inference \ --workspace-id $WORKSPACE_ID \ --parent-job-id $TRAIN_JOB_ID \ --inference-dataset " $DATASET_ID " \ --specs " $INFERENCE_SPECS " \ --base-experiment-ids '["' $BASE_EXPERIMENT_ID '"]' \ --encryption-key "nvidia_tlt" | jq -r '.id' ) TAO Launcher tao model grounding_dino inference [ -h ] -e <experiment spec file> inference.checkpoint = <model to be inferenced> Required Arguments The following arguments are required to run the command. -e, --experiment_spec : The experiment spec file to set up the inference experiment. Optional Arguments The following arguments are optional to run the command. inference.checkpoint : The .pth model to inference. Sample Usage The following is an example of using the inference command: tao model grounding_dino inference -e /path/to/spec.yaml inference.checkpoint = /path/to/model.pth

Exporting the Model# export# The export parameter defines the hyperparameters of the export process. export : checkpoint : /path/to/model.pth onnx_file : /path/to/model.onnx on_cpu : False opset_version : 17 input_channel : 3 input_width : 960 input_height : 544 batch_size : -1 Field value_type Description default_value valid_min valid_max valid_options automl_enabled results_dir string Path to where all the assets generated from a task are stored. FALSE gpu_id int The index of the GPU to build the TensorRT engine. 0 FALSE checkpoint string Path to the checkpoint file to run export. ??? FALSE onnx_file string Path to the onnx model file. ??? FALSE on_cpu bool Flag to export CPU compatible model. False FALSE input_channel int Number of channels in the input Tensor. 3 3 FALSE input_width int Width of the input image tensor. 960 32 FALSE input_height int Height of the input image tensor. 544 32 FALSE opset_version int Operator set version of the ONNX model used to generate the TensorRT engine. 17 1 FALSE batch_size int The batch size of the input Tensor for the engine. A value of -1 implies dynamic tensor shapes. -1 -1 FALSE verbose bool Flag to enable verbose TensorRT logging. False FALSE TAO Client (v2 API) EXPORT_JOB_ID = $( tao grounding_dino create-job \ --kind experiment \ --name "grounding_dino_export" \ --action export \ --workspace-id $WORKSPACE_ID \ --parent-job-id $TRAIN_JOB_ID \ --specs " $EXPORT_SPECS " \ --base-experiment-ids '["' $BASE_EXPERIMENT_ID '"]' \ --encryption-key "nvidia_tlt" | jq -r '.id' ) TAO Launcher tao model grounding_dino export [ -h ] -e <experiment spec file> export.checkpoint = <model to export> export.onnx_file = <onnx path> Required Arguments The following arguments are required to run the command. -e, --experiment_spec : The path to an experiment spec file. Optional Arguments The following arguments are optional to run the command. export.checkpoint : The .pth model to export.

export.onnx_file : The path where the .onnx model is saved. Sample Usage The following is an example of using the export command: tao model grounding_dino export -e /path/to/spec.yaml export.checkpoint = /path/to/model.pth export.onnx_file = /path/to/model.onnx