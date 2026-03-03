Mask Grounding DINO#

Mask Grounding DINO is an open vocabulary instance segmentation model included in the TAO. It supports the following tasks:

train

evaluate

inference

export

These tasks can be invoked from the TAO Launcher using the following convention on the command-line:

tao model mask_grounding_dino <sub_task> <args_per_subtask>

where, args_per_subtask are the command-line arguments required for a given subtask. Each subtask is explained in detail in the following sections.

Data Input for Mask Grounding DINO# Mask Grounding DINO expects directories of images for training files to be under ODVG format with JSONL and validation to be annotated JSON files in COCO format. Note Unlike other instance segmentation models in TAO, category_id in your COCO JSON file for Mask Grounding DINO must start from 0, and every category ID must be contiguous. The category IDs must range from 0 to num_classes - 1 . Because the original COCO annotation does not have a contiguous category id, see the TAO Data Service tao dataset annotations convert .

Creating an Experiment Spec File# TAO Client (v2 API) BASE_EXPERIMENT_ID = $( tao mask_grounding_dino list-base-experiments | jq -r '.[0].id' ) SPECS = $( tao mask_grounding_dino get-job-schema --action train --base-experiment-id $BASE_EXPERIMENT_ID | jq -r '.default' ) See also For information on how to create an experiment using the remote client, refer to the Creating an experiment section in the Remote Client documentation. TAO Launcher The training experiment spec file for Mask Grounding DINO includes model , train , and dataset parameters. This is an example spec file for finetuning a Mask Grounding DINO model with a swin_tiny_224_1k backbone on a COCO dataset. dataset : train_data_sources : - image_dir : /path/to/coco/train2017/ json_file : /path/to/coco/annotations/instances_train2017.jsonl # odvg format label_map : /path/to/coco/annotations/instances_train2017_labelmap.json - image_dir : /path/to/coco/train2017/ json_file : /path/to/refcoco-like/annotations/instances_train2017.jsonl # odvg format val_data_sources : image_dir : /path/to/coco/val2017/ json_file : /path/to/refcoco-like/annotations/instances_val2017_contiguous.jsonl # category ids need to be contiguous data_type : VG # or OD max_labels : 80 # Max number of positive + negative labels passed to the text encoder batch_size : 4 workers : 8 dataset_type : serialized # To reduce the system memory usage augmentation : scales : [ 480 , 512 , 544 , 576 , 608 , 640 , 672 , 704 , 736 , 768 , 800 ] input_mean : [ 0.485 , 0.456 , 0.406 ] input_std : [ 0.229 , 0.224 , 0.225 ] horizontal_flip_prob : 0.5 train_random_resize : [ 400 , 500 , 600 ] train_random_crop_min : 384 train_random_crop_max : 600 random_resize_max_size : 1333 test_random_resize : 800 model : backbone : swin_tiny_224_1k train_backbone : True num_feature_levels : 4 dec_layers : 6 enc_layers : 6 num_queries : 300 num_queries : 900 dropout_ratio : 0.0 dim_feedforward : 2048 log_scale : auto class_embed_bias : True # Adding bias in the contrastive embedding layer for training stability num_region_queries : 100 # 0 if not using ReLA, otherwise, the number of region queries loss_types : [ 'labels' , 'boxes' , 'masks' , 'rela' ] # Remove rela loss if not use ReLA train : optim : lr_backbone : 2e-5 lr : 2e-4 lr_steps : [ 10 , 20 ] num_epochs : 30 freeze : [ "backbone.0" , "bert" ] # if only finetuning pretrained_model_path : /path/to/your-gdino-pretrained-model # if only finetuning precision : bf16 # for efficient training Field value_type Description default_value valid_min valid_max valid_options automl_enabled encryption_key string False results_dir string /results False wandb collection False model collection Configurable parameters to construct the model for a Mask Grounding DINO experiment. False dataset collection Configurable parameters to construct the dataset for a Mask Grounding DINO experiment. False train collection Configurable parameters to construct the trainer for a Mask Grounding DINO experiment. False evaluate collection Configurable parameters to construct the evaluator for a Mask Grounding DINO experiment. False inference collection Configurable parameters to construct the inferencer for a Mask Grounding DINO experiment. False export collection Configurable parameters to construct the exporter for a Mask Grounding DINO experiment. False gen_trt_engine collection Configurable parameters to construct the TensorRT engine builder for a Mask Grounding DINO experiment. False model# The model parameter provides options to change the Mask Grounding DINO architecture. model : pretrained_model_path : /path/to/your-gdino-pretrained-model backbone : swin_tiny_224_1k train_backbone : True num_feature_levels : 4 dec_layers : 6 enc_layers : 6 num_queries : 300 dropout_ratio : 0.0 dim_feedforward : 2048 log_scale : auto class_embed_bias : True num_region_queries : 100 loss_types : [ 'labels' , 'boxes' , 'masks' , 'rela' ] Field value_type Description default_value valid_min valid_max valid_options automl_enabled pretrained_backbone_path string [Optional] Path to a pretrained backbone file. False backbone string Backbone name of the model. The TAO implementation of Grounding DINO supports Swin. swin_tiny_224_1k swin_tiny_224_1k,swin_base_224_22k,swin_base_384_22k,swin_large_224_22k,swin_large_384_22k False num_queries int Number of queries. 900 1 inf True num_feature_levels int Number of feature levels to use in the model. 4 1 5 False set_cost_class float Relative weight of the classification error in the matching cost. 1.0 0.0 inf False set_cost_bbox float Relative weight of the L1 error of the bounding box coordinates in the matching cost. 5.0 0.0 inf False set_cost_giou float Relative weight of the GIoU loss of the bounding box in the matching cost. 2.0 0.0 inf False cls_loss_coef float Relative weight of the classification error in the final loss. 2.0 0.0 inf False bbox_loss_coef float Relative weight of the L1 error of the bounding box coordinates in the final loss. 5.0 0.0 inf False giou_loss_coef float Relative weight of the GIoU loss of the bounding box in the final loss. 2.0 0.0 inf False rela_nt_loss_coef float Relative weight of the No-Target loss of the region query in the final loss. 1.0 0.0 inf False rela_minimap_loss_coef float Relative weight of the Minimap loss of the region query in the final loss. 0.5 0.0 inf False rela_union_mask_loss_coef float Relative weight of the Union Mask loss of the region query in the final loss. 2.0 0.0 inf False num_select int Number of top-K predictions selected during post-process. 300 1 True num_region_queries int Number of region queries. 0 if not using ReLA, otherwise, the number of region queries. 100 0 True interm_loss_coef float 1.0 False no_interm_box_loss bool True: No intermediate bbox loss. False False pre_norm bool True: Add layer norm in the encoder. False False two_stage_type string Type of two stage in DINO. standard standard,no False decoder_sa_type string Type of decoder self attention. sa sa,ca_label,ca_content False embed_init_tgt bool True: Add target embedding. True False fix_refpoints_hw int If -1, width and height are learned separately for each box. If -2, a shared width and height are learned. A value greater than 0 specifies learning with a fixed number. -1 -2 inf False pe_temperatureH int Temperature applied to the height dimension of the positional sine embedding. 20 1 inf False pe_temperatureW int Temperature applied to the width dimension of the positional sine embedding. 20 1 inf False return_interm_indices list Index of feature levels to use in the model. The length must match num_feature_levels . [1, 2, 3, 4] False use_dn bool True: Enable contrastive de-noising training in DINO. True False dn_number int Number of denoising queries in DINO. 0 0 inf False dn_box_noise_scale float Scale of noise applied to boxes during contrastive de-noising. If 0, noise is not applied. 1.0 0.0 inf False dn_label_noise_ratio float Scale of the noise applied to labels during contrastive denoising. If 0, noise is not applied. 0.5 0.0 False focal_alpha float Alpha value in the focal loss. 0.25 False focal_gamma float Gamma value in the focal loss. 2.0 False clip_max_norm float 0.1 False nheads int Number of heads. 8 False dropout_ratio float Probability of dropping hidden units. 0.0 0.0 1.0 False hidden_dim int Dimension of the hidden units. 256 False enc_layers int Number of encoder layers in the transformer. 6 1 True dec_layers int Number of decoder layers in the transformer. 6 1 True dim_feedforward int Dimension of the feedforward network. 2048 1 False dec_n_points int Number of reference points in the decoder. 4 1 False enc_n_points int Number of reference points in the encoder. 4 1 False aux_loss bool True: Use auxiliary decoding losses (loss at each decoder layer). True False dilation bool True: enable dilation in the backbone. False False train_backbone bool True: Set backbone weights as trainable or frozen. False: Backbone weights are frozen. True False text_encoder_type string BERT encoder type. If only the name of the type is provided, the weight is downloaded from the Hugging Face Hub. If a path is provided, we load the weight from the local path. bert-base-uncased False max_text_len int Maximum text length of BERT. 256 1 False class_embed_bias bool True: Set bias in the contrastive embedding. False False log_scale string [Optional] Initial value of a learnable parameter to multiply with the similarity matrix to normalize the output. Defaults to 'None' . If set to 'auto' , the similarity matrix is normalized by a fixed value sqrt(d_c) where d_c is the channel number.

If set to 'none' or None , no normalization is applied. none False loss_types list Losses to be used during training. [‘labels’, ‘boxes’] False backbone_names list Prefix of tensor names corresponding to the backbone. [‘backbone.0’, ‘bert’] False linear_proj_names list Linear projection layer names. [‘reference_points’, ‘sampling_offsets’] False has_mask bool True: Enable mask head in Grounding Dino. True False mask_loss_coef float Relative weight of mask error in the final loss. 2.0 False dice_loss_coef float Relative weight of dice loss of the segmentation in the final loss. 5.0 False train# The train parameter defines the hyperparameters of the training process. train : optim : lr : 0.0002 lr_backbone : 0.00002 momentum : 0.9 weight_decay : 0.0001 lr_scheduler : MultiStep lr_steps : [ 10 , 20 ] lr_decay : 0.1 num_epochs : 30 checkpoint_interval : 1 precision : bf16 distributed_strategy : ddp activation_checkpoint : True num_gpus : 8 num_nodes : 1 freeze : [ "backbone.0" , "bert" ] pretrained_model_path : /path/to/pretrained/model Field value_type Description default_value valid_min valid_max valid_options automl_enabled num_gpus int Number of GPUs to run the train job. 1 1 False gpu_ids list List of GPU IDs to run training on. Length of gpu_ids must match value of train.num_gpus . [0] False num_nodes int Number of nodes for training. >1 enables multi-node. 1 False seed int Seed for PyTorch initializer. <0 disables fixed seed. 1234 -1 inf False cudnn collection cuDNN configuration. False num_epochs int Number of training epochs. 10 1 inf True checkpoint_interval int Interval (in epochs) to save checkpoints. Helps resume training. 1 1 False validation_interval int Interval (in epochs) to run evaluation on validation dataset. 1 1 False resume_training_checkpoint_path string Path to checkpoint for resuming training. False results_dir string Path to store all assets generated from a task. False freeze list Layers to freeze. Example: [“backbone”, “transformer.encoder”, “input_proj”]. [] False pretrained_model_path string Path to pretrained Deformable DETR model for initialization. False clip_grad_norm float Clip gradient by L2 norm. 0.0 disables gradient clipping. 0.1 False is_dry_run bool True: Run trainer in Dry Run mode. Validates spec file and runs sanity check without initializing trainer. False False optim collection Hyperparameters for optimizer configuration. False precision string Training precision. fp32 fp16,fp32,bf16 False distributed_strategy string Multi-GPU training strategy. Supports DDP (Distributed Data Parallel) and FSDP (Fully Sharded DDP). ddp ddp,fsdp False activation_checkpoint bool True: Recompute activations in backward pass to save GPU memory. This avoids storing intermediate activations. True False verbose bool True: Enable detailed optimizer learning rate printing. False False optim# The optim parameter defines the config for the optimizer in training, including the learning rate, learning scheduler, and weight decay. optim : lr : 0.0002 lr_backbone : 0.00002 momentum : 0.9 weight_decay : 0.0001 lr_scheduler : MultiStep lr_steps : [ 10 , 20 ] lr_decay : 0.1 Field value_type Description default_value valid_min valid_max valid_options automl_enabled optimizer string Optimizer type for training. AdamW AdamW,SGD False monitor_name string Metric monitored by AutoReduce Scheduler. val_loss val_loss,train_loss False lr float Initial learning rate for model (excluding backbone). 0.0002 True lr_backbone float Initial learning rate for backbone. 2e-05 True lr_linear_proj_mult float Initial learning rate multiplier for linear projection layer. 0.1 True momentum float Momentum for AdamW optimizer. 0.9 True weight_decay float Weight decay coefficient. 0.0001 True lr_scheduler string Learning rate scheduler type. MultiStep: decrease lr by lr_decay at lr_steps.

StepLR: decrease lr by lr_decay every lr_step_size. MultiStep MultiStep,StepLR False lr_steps list Steps at which lr decreases (for MultiStep LR). [10] False lr_step_size int Number of steps between lr decreases (for StepLR). 10 True lr_decay float Factor to decrease lr for scheduler. 0.1 True dataset# The dataset parameter defines the dataset source, training batch size, and augmentation. dataset : train_data_sources : - image_dir : /path/to/coco/train2017/ json_file : /path/to/coco/annotations/instances_train2017.jsonl # odvg format label_map : /path/to/coco/annotations/instances_train2017_labelmap.json - image_dir : /path/to/coco/train2017/ json_file : /path/to/refcoco-like/annotations/instances_train2017.jsonl # odvg format val_data_sources : image_dir : /path/to/coco/val2017/ json_file : /path/to/refcoco-like/annotations/instances_val2017_contiguous.jsonl # category ids need to be contiguous data_type : VG # or OD test_data_sources : image_dir : /path/to/coco/images/val2017/ json_file : /path/to/coco/annotations/instances_val2017.json data_type : OD # or VG infer_data_sources : image_dir : /path/to/coco/images/val2017/ data_type : OD # or VG captions : [ "black cat" , "car" ] # or json file that contains the image path and captions max_labels : 80 batch_size : 4 workers : 8 Field value_type Description default_value valid_min valid_max valid_options automl_enabled train_data_sources list List of training data sources: image_dir : Directory containing training images.

json_file : Path to JSONL in ODVG training format.

label_map : Optional path for detection dataset label mapping. [{‘image_dir’: ‘’, ‘json_file’: ‘’, ‘label_map’: ‘’}, {‘image_dir’: ‘’, ‘json_file’: ‘’}] False val_data_sources collection Validation data source: image_dir : Directory containing validation images.

json_file : Path to JSON in COCO validation format.

data_type : Dataset type, OD or VG. Category ID must start from 0 to calculate validation loss. Run Data Services annotation conversion to make categories contiguous. {‘image_dir’: ‘’, ‘json_file’: ‘’, ‘data_type’: ‘’} False test_data_sources collection Test data source: image_dir : Directory containing test images.

json_file : Path to JSON in COCO test format.

data_type : Dataset type, OD or VG. {‘image_dir’: ‘’, ‘json_file’: ‘’, ‘data_type’: ‘’} False infer_data_sources collection Inference data source: image_dir : Directory containing inference images.

data_type : Dataset type, OD or VG.

captions : List of captions, use for OD inference only.

json_file : Path to JSON with image_path+caption pairs for VG. {‘image_dir’: ‘’, ‘data_type’: ‘’} False batch_size int Batch size for training and validation. 4 1 inf True workers int Number of parallel data loader workers. 8 1 inf True pin_memory bool True: Allocate pagelocked memory for faster CPU-GPU data transfer. True False dataset_type string Dataset structure type. default : Standard map-style, loads ODVG in each subprocess, can increase RAM.

serialized : Serialized via pickle and torch.Tensor, shared across subprocesses. serialized serialized,default False max_labels int Total labels to sample. After positive labels, samples negative labels to reach max_labels . OD: negative labels = categories absent in image.

Grounding: negative labels = phrases not in image captions. Higher max_labels may improve robustness at cost of longer training. 50 1 inf False eval_class_ids list Class IDs for evaluation. [1] False augmentation collection Data augmentation parameters. False has_mask bool True: Load mask annotations from dataset. False augmentation# The augmentation parameter contains hyperparameters for augmentation. augmentation : scales : [ 480 , 512 , 544 , 576 , 608 , 640 , 672 , 704 , 736 , 768 , 800 ] input_mean : [ 0.485 , 0.456 , 0.406 ] input_std : [ 0.229 , 0.224 , 0.225 ] horizontal_flip_prob : 0.5 train_random_resize : [ 400 , 500 , 600 ] train_random_crop_min : 384 train_random_crop_max : 600 random_resize_max_size : 1333 test_random_resize : 800 Field value_type Description default_value valid_min valid_max valid_options automl_enabled scales list Sizes to perform random resize. [480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800] False input_mean list Input mean for RGB frames. [0.485, 0.456, 0.406] False input_std list Input standard deviation per pixel for RGB frames. [0.229, 0.224, 0.225] False train_random_resize list Sizes to perform random resize for training data. [400, 500, 600] False horizontal_flip_prob float Probability for horizontal flip during training. 0.5 0.0 1.0 True train_random_crop_min int Minimum random crop size for training data. 384 1 inf True train_random_crop_max int Maximum random crop size for training data. 600 1 inf True random_resize_max_size int Maximum random resize size for training data. 1333 1 inf True test_random_resize int Random resize size for test data. 800 1 inf True fixed_padding bool True: Resize image to (sorted(scales[-1]), random_resize_max_size) without padding. This prevents a CPU memory leak. True False fixed_random_crop int Determines the resulting image resolution. 0 disables Large Scale Jittering (cropping). 1024 1 inf False

Training the Model# To train a Mask Grounding DINO model, use this command: TAO Client (v2 API) TRAIN_JOB_ID = $( tao mask_grounding_dino create-job \ --kind experiment \ --name "mask_grounding_dino_train" \ --action train \ --workspace-id $WORKSPACE_ID \ --specs " $TRAIN_SPECS " \ --train-datasets '["' $DATASET_ID '"]' \ --eval-dataset " $DATASET_ID " \ --base-experiment-ids '["' $BASE_EXPERIMENT_ID '"]' \ --encryption-key "nvidia_tlt" | jq -r '.id' ) See also For information on how to create an experiment using the remote client, refer to the Creating an experiment section in the Remote Client documentation. TAO Launcher tao model mask_grounding_dino train [ -h ] -e <experiment_spec> Required Arguments The following arguments are required to run the command. -e, --experiment_spec : The experiment specification file to set up the training experiment Optional Arguments The following arguments are optional to run the command. -h, --help : Show this help message and exit. Sample Usage This is an example of the train command: tao mask_grounding_dino model train -e /path/to/spec.yaml Optimizing Resource for Training Grounding DINO# Training Mask Grounding DINO requires strong GPUs (for example: V100/A100) with at least 15GB of VRAM and a lot of CPU memory to be trained on a standard dataset like COCO. This section outlines some of the strategies you can use to launch training with only limited resources. Optimize GPU Memory# There are various ways to optimize GPU memory usage. One trick is to reduce dataset.batch_size . However, this can cause your training to take longer than usual. We recommend setting the following configurations to optimize GPU consumption. Set train.precision to bf16 to enable automatic mixed precision training. This can reduce your GPU memory usage by 50%.

Set train.activation_checkpoint to True to enable activation checkpointing. By recomputing the activations instead of caching them into memory, the memory usage can be improved.

Set train.distributed_strategy to fsdp to enabled Fully Sharded Data Parallel training. This will share gradient calculation across different processes to help reduce GPU memory.

Try using more lightweight backbones like swin_tiny_224_1k or freeze the backbone through setting model.train_backbone to False.

Try changing the augmentation resolution in dataset.augmentation depending on your dataset. Optimize CPU Memory# To speed up data loading, it is a common practice to set high number of workers to spawn multiple processes. However, this can cause your CPU memory to become Out of Memory if the size of your annotation file is very large. Hence, we recommend setting below configurations in order to optimize CPU consumption. Set dataset.dataset_type to serialized so that the COCO-based annotation data can be shared across different subprocesses.

Set dataset.augmentation.fixed_padding to True so that images are padded before the batch formulation. Due to random resize and random crop augmentation during training, the resulting image resolution after transform can vary across images. Such variable image resolutions can cause memory leak and the CPU memory to slowly stacks up until it becomes Out of Memory in the middle of training. This is the limitation of PyTorch so we advise setting fixed_padding to True to help stablize the CPU memory usage.

Evaluating the Model# evaluate# The evaluate parameter defines the hyperparameters of the evaluate process. evaluate : checkpoint : /path/to/model.pth conf_threshold : 0.0 num_gpus : 1 ioi_threshold : 0.5 nms_threshold : 0.2 text_threshold : 0.3 Field value_type Description default_value valid_min valid_max valid_options automl_enabled num_gpus int 1 False gpu_ids list [0] False num_nodes int 1 False checkpoint string ??? False results_dir string False input_width int Width of the input image tensor. 1 False input_height int Height of the input image tensor. 1 False trt_engine string Path to the TensorRT engine to be used for evaluation. This only works with tao-deploy . False conf_threshold float Confidence threshold on box scores for filtering final masks and boxes. 0.0 False ioi_threshold float Intersection over instance (ioi) threshold between ReLA output and instance masks for filtering final masks and boxes. 0.5 False nms_threshold float Non-max suppression threshold on boxes to filter final masks and boxes. 0.2 False text_threshold float Text threshold for extracting phrases from expressions. 0.3 False To run evaluation with a Mask Grounding DINO model, use this command: TAO Client (v2 API) EVAL_JOB_ID = $( tao mask_grounding_dino create-job \ --kind experiment \ --name "mask_grounding_dino_evaluate" \ --action evaluate \ --workspace-id $WORKSPACE_ID \ --parent-job-id $TRAIN_JOB_ID \ --eval-dataset " $DATASET_ID " \ --specs " $EVALUATE_SPECS " \ --base-experiment-ids '["' $BASE_EXPERIMENT_ID '"]' \ --encryption-key "nvidia_tlt" | jq -r '.id' ) See also For information on how to create an experiment using the remote client, refer to the Creating an experiment section in the Remote Client documentation. TAO Launcher tao model mask_grounding_dino evaluate [ -h ] -e <experiment_spec> \ evaluate.checkpoint = <model to be evaluated> Required Arguments The following arguments are required. -e, --experiment_spec : The experiment spec file to set up the evaluation experiment Optional Arguments The following arguments are optional to run the command. evaluate.checkpoint : The .pth model to be evaluated Sample Usage This is an example of using the evaluate command: tao model mask_grounding_dino evaluate -e /path/to/spec.yaml evaluate.checkpoint = /path/to/model.pth

Running Inference with a Grounding Model# inference# The inference parameter defines the hyperparameters of the inference process. inference : checkpoint : /path/to/model.pth conf_threshold : 0.5 num_gpus : 1 color_map : "black cat" : red car : blue ioi_threshold : 0.5 nms_threshold : 0.2 text_threshold : 0.3 dataset : infer_data_sources : image_dir : /data/raw-data/val2017/ captions : [ "black cat" , "cat" ] # or json file that contains the image path and captions for VG data_type : OD # or VG Field value_type Description default_value valid_min valid_max valid_options automl_enabled num_gpus int 1 False gpu_ids list [0] False num_nodes int 1 False checkpoint string ??? False results_dir string False trt_engine string Path to the TensorRT engine to be used for evaluation. This only works with tao-deploy . False color_map collection Class-wise dictionary with colors to render boxes. False conf_threshold float Confidence threshold on box scores for filtering final masks and boxes. 0.0 False ioi_threshold float Intersection over instance (ioi) threshold between ReLA output and instance masks for filtering final masks and boxes. 0.5 False nms_threshold float Non-max suppression threshold on boxes to filter final masks and boxes. 0.2 False text_threshold float Text threshold for extracting phrases from expressions. 0.3 False is_internal bool True: Render with internal directory structure. False False input_width int Width of the input image tensor. 960 32 False input_height int Height of the input image tensor. 544 32 False outline_width int Width in pixels of the bounding box outline. 3 1 False The inference tool for Mask Grounding DINO models can be used to visualize bboxes and generate frame-by- frame KITTI format labels on a directory of images. TAO Client (v2 API) INFERENCE_JOB_ID = $( tao mask_grounding_dino create-job \ --kind experiment \ --name "mask_grounding_dino_inference" \ --action inference \ --workspace-id $WORKSPACE_ID \ --parent-job-id $TRAIN_JOB_ID \ --inference-dataset " $DATASET_ID " \ --specs " $INFERENCE_SPECS " \ --base-experiment-ids '["' $BASE_EXPERIMENT_ID '"]' \ --encryption-key "nvidia_tlt" | jq -r '.id' ) See also For information on how to create an experiment using the remote client, refer to the Creating an experiment section in the Remote Client documentation. TAO Launcher tao model mask_grounding_dino inference [ -h ] -e <experiment spec file> inference.checkpoint = <model to be inferenced> Required Arguments The following arguments are required to run the command. -e, --experiment_spec : The experiment spec file to set up the inference experiment Optional Arguments The following arguments are optional to run the command. inference.checkpoint : The .pth model to inference Sample Usage This is an example of using the inference command: tao model mask_grounding_dino inference -e /path/to/spec.yaml inference.checkpoint = /path/to/model.pth

Exporting the Model# export# The export parameter defines the hyperparameters of the export process. export : checkpoint : /path/to/model.pth onnx_file : /path/to/model.onnx on_cpu : False opset_version : 17 input_channel : 3 input_width : 960 input_height : 544 batch_size : -1 Field Value Type Description default_value valid_min valid_max valid_options automl_enabled results_dir string Path to where all the assets generated from a task are stored. False gpu_id int The index of the GPU to build the TensorRT engine. 0 False checkpoint string Path to the checkpoint file to run export. ??? False onnx_file string Path to the ONNX model file. ??? False on_cpu bool True: Export CPU compatible model. False False input_channel int Number of channels in the input tensor. 3 3 False input_width int Width of the input image tensor. 960 32 False input_height int Height of the input image tensor. 544 32 False opset_version int Operator set version of the ONNX model used to generate the TensorRT engine. 17 1 False batch_size int The batch size of the input Tensor for the engine. A value of -1 implies dynamic tensor shapes. -1 -1 False verbose bool True: Enable verbose TensorRT logging. False False TAO Client (v2 API) EXPORT_JOB_ID = $( tao mask_grounding_dino create-job \ --kind experiment \ --name "mask_grounding_dino_export" \ --action export \ --workspace-id $WORKSPACE_ID \ --parent-job-id $TRAIN_JOB_ID \ --specs " $EXPORT_SPECS " \ --base-experiment-ids '["' $BASE_EXPERIMENT_ID '"]' \ --encryption-key "nvidia_tlt" | jq -r '.id' ) See also For information on how to create an experiment using the remote client, refer to the Creating an experiment section in the Remote Client documentation. TAO Launcher tao model mask_grounding_dino export [ -h ] -e <experiment spec file> export.checkpoint = <model to export> export.onnx_file = <onnx path> Required Arguments The following arguments are required to run the command. -e, --experiment_spec : The path to an experiment spec file Optional Arguments The following arguments are optional to run the command. export.checkpoint : The .pth model to export

export.onnx_file : The path where the .onnx model is saved Sample Usage This is an example of using the export command: tao model mask_grounding_dino export -e /path/to/spec.yaml export.checkpoint = /path/to/model.pth export.onnx_file = /path/to/model.onnx