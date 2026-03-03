OCDNet is an optical-character detection model that is included in the TAO. It supports the following tasks:

train

evaluate

inference

prune

export

Each task is explained in detail in the following sections.

Note Throughout this documentation are references to $EXPERIMENT_ID and $DATASET_ID in the FTMS Client sections. For instructions on creating a dataset using the remote client, refer to the Creating a dataset section in the Remote Client documentation. For instructions on creating an experiment using the remote client, refer to the Creating an experiment section in the Remote Client documentation.

The spec format is YAML for TAO Launcher, and JSON for FTMS Client.

File-related parameters, such as dataset paths or pretrained model paths, are required only for TAO Launcher, not for FTMS Client.

Preparing the Dataset# The dataset for OCDNet contains images and the corresponding label files. Both the training dataset and test dataset must follow the same structure. The directory structure should be organized as follows, where the directory name for images is img and the directory name for label files is gt . By default, the label file is expected to use gt_ as a prefix for comparison to the corresponding image file. The exact directory names train and test are not required but are preferred by convention. /train /img img_0.jpg img_1.jpg ... /gt gt_img_0.txt gt_img_1.txt ... /test /img img_0.jpg img_1.jpg ... /gt gt_img_0.txt gt_img_1.txt ... Below is an example label file from the public ICDAR2015 dataset: $ cat ICDAR2015/test/gt/gt_img_14.txt 268 ,82,335,93,332,164,267,164,the 344 ,94,433,112,427,159,336,163,Future 208 ,191,374,184,371,213,208,241,Communications 370 ,176,420,176,416,204,373,213,### 1 ,57,261,76,261,187,0,190,venting 1 ,208,203,200,203,241,3,294,ntelligence. Note The label file contains the cooridnates for all the points. The last one is the text. If the text is ### and the training spec file sets ignore_tags to ['###'] , then those lines are ignored during training.

Creating an Experiment Spec File# The spec file for OCDNet includes model , train , dataset , and evaluate , as well as other global parameters. Below is an example spec file for training an OCDNet model with a FAN-tiny backbone on an ICDAR2015 dataset. TAO Client (v2 API) Use the following commands to get the base experiment ID and fetch the job schema for OCDNet: BASE_EXPERIMENT_ID = $( tao ocdnet list-base-experiments | jq -r '.[0].id' ) SCHEMA = $( tao ocdnet get-job-schema --action train --base-experiment-id $BASE_EXPERIMENT_ID | jq -r '.default' ) TAO Launcher model : load_pruned_graph : False pruned_graph_path : '/results/prune/pruned_0.1.pth' pretrained_model_path : '/data/ocdnet/ocdnet_fan_tiny_2x_icdar.pth' backbone : fan_tiny_8_p4_hybrid enlarge_feature_map_size : True activation_checkpoint : True train : num_gpus : 1 results_dir : /results/train num_epochs : 10 resume_training_checkpoint_path : '/results/train/resume.pth' checkpoint_interval : 5 validation_interval : 5 seed : 1234 is_dry_run : False precision : fp32 model_ema : False model_ema_decay : 0.999 trainer : clip_grad_norm : 5.0 optimizer : type : Adam args : lr : 0.001 lr_scheduler : type : WarmupPolyLR args : warmup_epoch : 3 post_processing : type : SegDetectorRepresenter args : thresh : 0.3 box_thresh : 0.55 max_candidates : 1000 unclip_ratio : 1.5 metric : type : QuadMetric args : is_output_polygon : false dataset : train_dataset : data_name : ICDAR2015Dataset data_path : [ '/data/ocdnet_vit/train' ] args : pre_processes : - type : IaaAugment args : - { 'type' : Fliplr , 'args' :{ 'p' : 0.5 }} - { 'type' : Affine , 'args' :{ 'rotate' :[ -45 , 45 ]}} - { 'type' : Sometimes , 'args' :{ 'p' : 0.2 , 'then_list' :{ 'type' : GaussianBlur , 'args' :{ 'sigma' :[ 1.5 , 2.5 ]}}}} - { 'type' : Resize , 'args' :{ 'size' :[ 0.5 , 3 ]}} - type : EastRandomCropData args : size : [ 640 , 640 ] max_tries : 50 keep_ratio : true - type : MakeBorderMap args : shrink_ratio : 0.4 thresh_min : 0.3 thresh_max : 0.7 - type : MakeShrinkMap args : shrink_ratio : 0.4 min_text_size : 8 img_mode : BGR filter_keys : [ img_path , img_name , text_polys , texts , ignore_tags , shape ] ignore_tags : [ '*' , '###' ] loader : batch_size : 1 pin_memory : true num_workers : 12 validate_dataset : data_name : ICDAR2015Dataset data_path : [ '/data/ocdnet_vit/test' ] args : pre_processes : - type : Resize2D args : short_size : - 1280 - 736 resize_text_polys : true img_mode : BGR filter_keys : [] ignore_tags : [ '*' , '###' ] loader : batch_size : 1 pin_memory : false num_workers : 1 The top level description of the spec file is provided in the table below. Parameter Data Type Default Description Supported Values model dict config – The configuration of the model architecture dataset dict config – The configuration of the dataset train dict config – The configuration of the training task evaluate dict config – The configuration of the evaluation task inference dict config – The configuration of the inference task encryption_key string None The encryption key to encrypt and decrypt model files results_dir string /results The directory where experiment results are saved export dict config – The configuration of the ONNX export task gen_trt_engine dict config – The configuration of the TensorRT generation task. Only used in TAO deploy prune dict config – The configuration of the pruning task name str – Model# The model parameter provides the list of parameters for the model. Parameter Data Type Default Description Supported Values load_pruned_graph bool false A flag specifying whether to load the pruned graph. Set to True if train/evaluate/export/inference is being performed against a pruned model. true/false pruned_graph_path string – The path to the pruned graph model (if load_pruned_graph is True) unix path pretrained_model_path string – The path to the pretrained model unix path backbone string deformable_resnet18 The backbone of the model deformable_resnet18 deformable_resnet50 fan_tiny_8_p4_hybrid enlarge_feature_map_size bool false A flag specifying whether to enlarge the output feature map size of the FAN-tiny backbone. This flag has no effect when using a deformable_resnet backbone. true/false activation_checkpoint bool false A flag specifying whether to use activation checkpoints to save GPU memory. This flag has no effect when using a deformable_resnet backbone. true/false true/false Train# The train parameter provides the parameters for training. Parameter Datatype Default Description Supported Values num_gpus unsigned int 1 The number of GPUs to use for distributed training >0 gpu_ids List[int] [0] The indices of the GPU’s to use for distributed training seed unsigned int 1234 The random seed for random, NumPy, and torch >0 num_epochs unsigned int 10 The total number of epochs to run the experiment >0 checkpoint_interval unsigned int 1 The epoch interval at which the checkpoints are saved >0 validation_interval unsigned int 1 The epoch interval at which the validation is run >0 resume_training_checkpoint_path string The intermediate PyTorch Lightning checkpoint to resume training from results_dir string /results/train The directory to save training results optimizer dict config – The configuration for the optimizer – lr_scheduler dict config – The configuration for the lr_scheduler – post_processing dict config – The configuration for post_processing. – metric dict config – The configuration for metric computing. QuadMetric is supported. If is_output_polygon is True, a polygon will be generated. If it is False, a BBox will be generated. – is_dry_run bool false If this flag is True, only one batch will run. This flag is only recommended for debugging purposes. true/false precision string fp32 The precision that the model will be trained on. If this value is set to ‘fp16’, AMP training will be enabled fp32/fp16 model_ema bool false A flag to enable model EMA. The default value is False. If the value is True, model EMA will be enabled during training true/false model_ema_decay float 0.999 The decay of model EMA. The default value is 0.999. This value is only used when model_ema is set to True. (0, 1] optimizer# optimizer: type: Adam args: lr: 0 .001 Parameter Data Type Default Description Supported Values type string Adam The optimizer type Adam lr float – The initial learning rate >=0.0 lr_scheduler# lr_scheduler: type: WarmupPolyLR args: warmup_epoch: 3 Parameter Data Type Default Description Supported Values type string WarmupPolyLR Decays the learning rate via a polynomial function. The learning rate increases to initial value during warmup stage and is reduced from the initial value to zero during the training stage. WarmupPolyLR warmup_epoch unsigned int 3 The warmup epoch, which the learning rate increases to the intitial value (i.e. optimizer.args.lr ). The warmup epoch should not be the same as the num_epochs . >=0 post_processing# post_processing: type: SegDetectorRepresenter args: thresh: 0 .3 box_thresh: 0 .55 max_candidates: 1000 unclip_ratio: 1 .5 Parameter Data Type Default Description Supported Values type string SegDetectorRepresenter The name of the post_processing. The post_processing will generate BBox or polygon. SegDetectorRepresenter thresh float 0.3 The threshold for binarization, which is used in generating an approximate binary map. 0.0 ~ 1.0 box_thresh float 0.7 The BBox threshold. If the effective area is lower than this threshold, the prediction will be ignored, which means no text is detected. 0.0 ~ 1.0 max_candidates unsigned int 1000 The maximum candidate output. Enlarge this parameter if characters are detected in one area but obviously not in the other area of the image. > 1 unclip_ratio float 1.5 The unclip ratio using the Vatti clipping algorithm in the probability map. The BBox will look larger if this ratio is set larger. >0.0 Dataset# The dataset is defined by two sections: train_dataset and validate_dataset Parameter Data Type Default Description Supported Values train_dataset dict config – The configuragtion for the training dataset – validate_dataset dict config – The configuragtion for the validation dataset – The parameters for train_dataset are provided below. Parameter Data Type Default Description Supported Values data_name



string



ICDAR2015Dataset



The dataset name. For “ICDAR2015Dataset”, the label file is expected to use gt_ as a prefix. For “UberDataset”, the label file is expected to use truth_ as a prefix. ICDAR2015Dataset UberDataset

data_path string list – The list of paths that contain images used for training: For example, ['path_1'] or ['path_1', 'path_2', ...] – pre_processes dict – The pre-processing configuration (see ) train_preprocess for more details – img_mode string BGR The image mode BGR, RGB, GRAY filter_keys string list ['img_path', 'img_name', 'text_polys', 'texts', 'ignore_tags', 'shape'] The keys to ignore – ignore_tags string list ['*', '###'] The labels that are not used to train – batch_size unsigned int False The batch size. Set to a lower value if you encounter out-of-memory errors. >0 pin_memory bool False A flag specifying whether to enable pinned memory true/false num_workers unsigned int 1 The threds used to load data >=0 train_preprocess# pre_processes: - type: IaaAugment args: - { 'type' :Fliplr, 'args' : { 'p' :0.5 }} - { 'type' : Affine, 'args' : { 'rotate' : [ -45,45 ]}} - { 'type' :Sometimes, 'args' : { 'p' :0.2, 'then_list' : { 'type' : GaussianBlur, 'args' : { 'sigma' : [ 1 .5,2.5 ]}}}} - { 'type' :Resize, 'args' : { 'size' : [ 0 .5,3 ]}} - type: EastRandomCropData args: size: [ 640 ,640 ] max_tries: 50 keep_ratio: true - type: MakeBorderMap args: shrink_ratio: 0 .4 thresh_min: 0 .3 thresh_max: 0 .7 - type: MakeShrinkMap args: shrink_ratio: 0 .4 min_text_size: 8 Parameter Data Type Default Description Supported Values IaaAugment







dict list







{'type':Fliplr, 'args':{'p':0.5}} {'type': Affine, 'args':{'rotate':[-10,10]}} {'type':Sometimes,'args':{'p':1.0, 'then_list':{'type': GaussianBlur, 'args':{'sigma':[1.5,2.5]}}}} {'type':Resize,'args':{'size':[0.5,3]}}

Uses imgaug to perform augmentation. “Fliplr”, “Affine”, “Sometimes”, “GaussianBlur” and “Resize” are used by default. p defines the probability of each image to be flipped. rotate defines the degree range when rotating images by a random value. Sometimes defines only p percent of all images with one or more augmenters. then_list defines the Augmenter(s) to apply to p percent of all images GaussianBlur defines the blur using gaussian kernels. sigma defines the standard deviation of the gaussian kernel. size defines the range when resizing each image compared to its original size. p : 0.0 ~ 1.0 roate : -180 ~ 180 sigma : -180 ~ 180 resize : >0.0 ~ >0.0 EastRandomCropData dict config – – The ramdom crop after augmentation. size defines the cropped target size(width,height). The width and height should be multiples of 32. max_tries defines the maximum times to try to crop since the cropped area may be too small or cropping may have failed. keep_ratio specifies whether to keep the aspect ratio. size : [>0, >0] max_tries : >0 keep_ratio : true/false MakeBorderMap dict config – Defines the parameter when generating a threshold map. shrink_ratio is used to calculate the distance between expanding/shrinking polygons and the original text polygon. thresh_min and thresh_max will set the threshold range when generating the threshold map. 0.0 ~ 1.0 MakeShrinkMap dict config – Defines the parameter when generating a probability map. shrink_ratio is used to generate shrunken polygons. min_text_size specifies that the text will be ignored if its height or width is lower than this parameter. 0.0 ~ 1.0 The parameters for validate_dataset are similar to train_dataset , except below validation_preprocess. validation_preprocess# pre_processes: - type: Resize2D args: short_size: - 1280 - 736 resize_text_polys: true Parameter Data Type Default Description Supported Values type string Resize2D Resize the images and labels before evaluation. Resize2D short_size list – Resize the image to (width x height). >0, >0, and multiples of 32. resize_text_polys bool – A flag specifying whether to resize the text coordinate true/false Evaluate# The following is an example spec file for evaluating on the ICDAR2015 dataset: model: load_pruned_graph: False pruned_graph_path: '/results/prune/pruned_0.1.pth' backbone: deformable_resnet18 evaluate: results_dir: /results/evaluate checkpoint: /results/train/model_best.pth gpu_id: 0 post_processing: type: SegDetectorRepresenter args: box_thresh: 0 .55 max_candidates: 1000 unclip_ratio: 1 .5 metric: type: QuadMetric args: is_output_polygon: false dataset: validate_dataset: data_path: [ '/data/ocdnet/test' ] args: pre_processes: - type: Resize2D args: short_size: - 1280 - 736 resize_text_polys: true img_mode: BGR filter_keys: [] ignore_tags: [ '*' , '###' ] loader: batch_size: 1 shuffle: false pin_memory: false num_workers: 4 Inference# The following is an example spec file for running infernce: model: load_pruned_graph: false pruned_graph_path: '/results/prune/pruned_0.1.pth' backbone: deformable_resnet18 inference: checkpoint: '/results/train/model_best.pth' input_folder: /data/ocdnet/test/img width: 1280 height: 736 img_mode: BGR polygon: false results_dir: /results/inference post_processing: type: SegDetectorRepresenter args: thresh: 0 .3 box_thresh: 0 .55 max_candidates: 1000 unclip_ratio: 1 .5 The inference parameter defines the hyper-parameters of the inference process. Inference draws bounding boxes or polygons and visualizes it in images. Parameter Datatype Default Description Supported Values checkpoint string – The path to the pth model Unix path results_dir string /results/inference The directory to save inference results num_gpus unsigned int 1 The number of GPUs to use for distributed inference >0 gpu_ids List[int] [0] The indices of the GPU’s to use for distributed inference input_folder string – The path to the input folder for inference Unix path width unsigned int – The input width >=1 height unsigned int – The input height >=1 img_mode string – The image mode BGR/RGB/GRAY polygon bool – A True value specifies BBox, while a False value specifies polygon. true, false

Training the Model# Use the following command to run OCDnet training: TAO Client (v2 API) TRAIN_JOB_ID = $( tao ocdnet create-job \ --kind experiment \ --name "ocdnet_train" \ --action train \ --workspace-id $WORKSPACE_ID \ --specs " $SCHEMA " \ --train-datasets '["' $DATASET_ID '"]' \ --eval-dataset " $DATASET_ID " \ --base-experiment-ids '["' $BASE_EXPERIMENT_ID '"]' \ --encryption-key "nvidia_tlt" | jq -r '.id' ) TAO Launcher tao model ocdnet train -e <experiment_spec_file> [ results_dir = <global_results_dir> ] [ model.<model_option> = <model_option_value> ] [ dataset.<dataset_option> = <dataset_option_value> ] [ train.<train_option> = <train_option_value> ] [ train.gpu_ids = <gpu indices> ] [ train.num_gpus = <number of gpus> ] Required Arguments The only required argument is the path to the experiment spec: -e, --experiment_spec : The experiment specification file to set up the training experiment Optional Arguments You can set optional arguments to override the option values in the experiment spec file. -h, --help : Show this help message and exit.

model.<model_option> : The model options.

dataset.<dataset_option> : The dataset options.

train.<train_option> : The train options.

train.optim.<optim_option> : The optimizer options Note For training, evaluation, and inference, we expose two variables for each task: num_gpus and gpu_ids , which default to 1 and [0] , respectively. If both are passed, but are inconsistent, for example num_gpus = 1 , gpu_ids = [0, 1] , then they are modified to follow the setting that implies more GPUs; in the same example num_gpus is modified from 1 to 2. In some cases multi-GPU training may result in a segmentation fault. You can circumvent this by setting the enviroment variable OMP_NUM_THREADS to 1. Depending upon your model of execution, you may use the following methods to set this variable: CLI Launcher : You may set the environment variable by adding the following fields to the Envs field of your ~/.tao_mounts.json file as mentioned in bullet 3 in ths section Running the launcher. { "Envs" : [ { "variable" : "OMP_NUM_THREADSR" , "value" : "1" } }

Docker: You may set environment variables in Docker by setting the -e flag in the Docker command line. docker run -it --rm --gpus all \ -e OMP_NUM_THREADS = 1 \ -v /path/to/local/mount:/path/to/docker/mount nvcr.io/nvidia/tao/tao-toolkit:5.5.0-pyt <model> train -e Checkpointing and Resuming Training At every train.checkpoint_interval , a PyTorch Lightning checkpoint is saved. It is called model_epoch_<epoch_num>.pth . Checkpoints are saved in train.results_dir , like this: $ ls /results/train 'model_epoch_000.pth' 'model_epoch_001.pth' 'model_epoch_002.pth' 'model_epoch_003.pth' 'model_epoch_004.pth' The latest checkpoint is also be saved as ocd_model_latest.pth . Training automatically resumes from ocd_model_latest.pth , if it exists in train.results_dir . This is superseded by train.resume_training_checkpoint_path , if it is provided. The major implication of this logic is that, if you wish to trigger fresh training from scratch, either: Specify a new, empty results directory (Recommended)

Remove the latest checkpoint from the results directory Note By default, the training is using DDP (Distributed Data Parallel) strategy. When train with multi-gpus, only if evaluation images are multiple of num_gpus * evaluate_batch_size , the hmean result during training will be the same as the hmean result of running tao model ocdnet evaluate .

Evaluating the Model# Use the following command to run OCDNet evaluation: TAO Client (v2 API) EVAL_JOB_ID = $( tao rtdetr create-job \ --kind experiment \ --name "rtdetr_evaluate" \ --action evaluate \ --workspace-id $WORKSPACE_ID \ --parent-job-id $TRAIN_JOB_ID \ --eval-dataset " $DATASET_ID " \ --specs " $SCHEMA " \ --base-experiment-ids '["' $BASE_EXPERIMENT_ID '"]' \ --encryption-key "nvidia_tlt" | jq -r '.id' ) TAO Launcher tao model ocdnet evaluate -e <experiment_spec_file> evaluate.checkpoint = <model to be evaluated> [ evaluate.<evaluate_option> = <evaluate_option_value> ] [ evaluate.gpu_ids = <gpu indices> ] [ evaluate.num_gpus = <number of gpus> ] Required Arguments The following arguments are required. -e, --experiment_spec : The experiment spec file to set up the evaluation experiment.

evaluate.checkpoint : The .pth model to be evaluated. Optional Arguments The following arguments are optional to run the command. evaluate.<evaluate_option> : The evaluate options.

Running Inference on the OCDNet Model# TAO Client (v2 API) INFER_JOB_ID = $( tao rtdetr create-job \ --kind experiment \ --name "rtdetr_inference" \ --action inference \ --workspace-id $WORKSPACE_ID \ --parent-job-id $TRAIN_JOB_ID \ --inference-dataset " $DATASET_ID " \ --specs " $SCHEMA " \ --base-experiment-ids '["' $BASE_EXPERIMENT_ID '"]' \ --encryption-key "nvidia_tlt" | jq -r '.id' ) TAO Launcher tao ocdnet inference -e <experiment_spec_file> inference.checkpoint = <model to be inferenced> inference.input_folder = <path to input folder> [ inference.<inference_option> = <inference_option_value> ] [ inference.gpu_ids = <gpu indices> ] [ inference.num_gpus = <number of gpus> ] Required Arguments The following arguments are required to run the command. -e, --experiment_spec : The experiment spec file to set up the inference experiment.

inference.checkpoint : The .pth model to inference.

inference.input_folder : The path to the input folder. Optional Arguments The following arguments are optional to run the command. inference.<inference_option> : The inference options. Note Inference expects existing label files in the gt folder. If there are no label files, generate dummy labels under the gt folder. Use the following script for reference: #!/bin/bash folder_path = /workspace/datasets/ICDAR2015/datasets/test mkdir -p ${ folder_path } /gt for filename in ` ls ${ folder_path } /img ` ; do touch " ${ folder_path } /gt/gt_ ${ filename %.* } .txt" echo "10,10,10,20,20,10,20,20,###" > " ${ folder_path } /gt/gt_ ${ filename %.* } .txt" done

Pruning and Retraining an OCDNet Model# Model pruning reduces model parameters to improve inference frames per second (FPS) while maintaining nearly the same hmean. Pruning is applied to an already trained OCDNet model. After pruning, the pruned graph model is generated. It is a new model with fewer parameters. After you have this pruned graph model, you must retrain it on the same dataset to bring back the hmean. During retraining, you need to enable loading this pruned graph model and setting the path to this model. The prune parameter defines the hyperparameters of the pruning process. prune: checkpoint: /results/train/model_best.pth ch_sparsity: 0 .2 round_to: 32 p: 2 results_dir: /results/prune verbose: True model: backbone: fan_tiny_8_p4_hybrid enlarge_feature_map_size: True fuse_qkv_proj: False dataset: validate_dataset: data_path: [ '/data/ocdnet_vit/test' ] args: pre_processes: - type: Resize2D args: short_size: - 640 - 640 resize_text_polys: true img_mode: BGR filter_keys: [] ignore_tags: [ '*' , '###' ] loader: batch_size: 1 shuffle: false pin_memory: false num_workers: 1 Parameter Datatype Default Description Supported Values checkpoint string The path to PyTorch model to prune unix path ch_sparsity float 0.1 The pruning threshold 0.0 ~ 1.0 results_dir string The path to the results directory Unix path round_to unsigned int Round channels to the nearest multiple of round_to. E.g., round_to=8 means channels will be rounded to 8x. >0 p unsigned int The norm degree to estimate the importance of channels. Default: 2 >0 verbose bool A flag whether print prune information, default: True true/false fuse_qkv_proj bool A flag whether fuse the qkv projection, default: True, it’s only needed set to True when using fan-tiny backbone. true/false Use the following command to run pruning on the OCDNet model. TAO Client (v2 API) PRUNE_JOB_ID = $( tao ocdnet create-job \ --kind experiment \ --name "ocdnet_prune" \ --action prune \ --workspace-id $WORKSPACE_ID \ --parent-job-id $TRAIN_JOB_ID \ --specs " $SCHEMA " \ --base-experiment-ids '["' $BASE_EXPERIMENT_ID '"]' \ --encryption-key "nvidia_tlt" | jq -r '.id' ) TAO Launcher tao model ocdnet prune -e $SPECS_DIR /prune.yaml \ prune.checkpoint = $RESULTS_DIR /train/model_best.pth \ prune.results_dir = $RESULTS_DIR /prune \ [ prune.<prune_option> = <prune_option_value> ] Required Arguments The following arguments are required. -e, --experiment_spec_file : The experiment spec file to set up the pruning experiment. Optional Arguments The following arguments are optional to run the command. prune.<prune_option> : The prune options. After pruning, the pruned model can be used for retraining (that is, fine-tuning). To start the retraining, you need to set the load_pruned_graph parameter to true and set the pruned_graph_path parameter to point to the model that is generated from pruning. Note When retraining, evaluating, performing inference on, or exporting a model that has a pruned structure, you need to set load_pruned_graph to true so that the newly pruned model structure is imported. See the examples for more details. Here’s an example of running training with a pruned model: TAO Client (v2 API) PRUNE_SCHEMA = $( echo $SCHEMA | jq -r '.model.load_pruned_graph=true' ) RETRAIN_JOB_ID = $( tao ocdnet create-job \ --kind experiment \ --name "ocdnet_retrain" \ --action retrain \ --workspace-id $WORKSPACE_ID \ --parent-job-id $PRUNE_JOB_ID \ --train-datasets '["' $DATASET_ID '"]' \ --eval-dataset " $DATASET_ID " \ --specs " $PRUNE_SCHEMA " \ --base-experiment-ids '["' $BASE_EXPERIMENT_ID '"]' \ --encryption-key "nvidia_tlt" | jq -r '.id' ) TAO Launcher tao model ocdnet train -e $SPECS_DIR /train.yaml \ train.results_dir = $RESULTS_DIR /retrain \ model.load_pruned_graph = true \ model.pruned_graph_path = $RESULTS_DIR /prune/pruned_ $ch_sparsity .pth Here’s an example of resuming training against a pruned model: TAO Client (v2 API) RETRAIN_RESUME_JOB_ID = $( tao ocdnet job-resume --job-id $RETRAIN_JOB_ID --parent_job_id $TRAIN_JOB_ID --specs " $PRUNE_SCHEMA " ) TAO Launcher tao model ocdnet train -e $SPECS_DIR /train.yaml \ train.results_dir = $RESULTS_DIR /retrain \ model.load_pruned_graph = true \ model.pruned_graph_path = $RESULTS_DIR /prune/pruned_ $ch_sparsity .pth train.resume_training_checkpoint_path = $RESULTS_DIR /retrain/resume.pth Here’s an example of running evalation against a pruned model: TAO Client (v2 API) EVAL_JOB_ID = $( tao ocdnet create-job \ --kind experiment \ --name "ocdnet_evaluate" \ --action evaluate \ --workspace-id $WORKSPACE_ID \ --parent-job-id $RETRAIN_JOB_ID \ --eval-dataset " $DATASET_ID " \ --specs " $SCHEMA " \ --base-experiment-ids '["' $BASE_EXPERIMENT_ID '"]' \ --encryption-key "nvidia_tlt" | jq -r '.id' ) TAO Launcher tao model ocdnet evaluate -e $SPECS_DIR /evaluate.yaml \ train.results_dir = $RESULTS_DIR /evaluate \ model.load_pruned_graph = true \ model.pruned_graph_path = $RESULTS_DIR /prune/pruned_ $ch_sparsity .pth evaluate.checkpoint == $RESULTS_DIR /train/model_best.pth Here’s an example of running inference against a pruned model: TAO Client (v2 API) INFER_JOB_ID = $( tao ocdnet create-job \ --kind experiment \ --name "ocdnet_inference" \ --action inference \ --workspace-id $WORKSPACE_ID \ --parent-job-id $RETRAIN_JOB_ID \ --inference-dataset " $DATASET_ID " \ --specs " $PRUNE_SCHEMA " \ --base-experiment-ids '["' $BASE_EXPERIMENT_ID '"]' \ --encryption-key "nvidia_tlt" | jq -r '.id' ) TAO Launcher tao model ocdnet inference -e $SPECS_DIR /inference.yaml \ model.load_pruned_graph = true \ model.pruned_graph_path = $RESULTS_DIR /prune/pruned_ $ch_sparsity .pth inference.checkpoint = $RESULTS_DIR /train/model_best.pth \ inference.input_folder = $DATA_DIR /test/img \ inference.results_dir = $RESULTS_DIR /infer Here’s an example of running export against a pruned model: TAO Client (v2 API) EXPORT_JOB_ID = $( tao ocdnet create-job \ --kind experiment \ --name "ocdnet_export" \ --action export \ --workspace-id $WORKSPACE_ID \ --parent-job-id $RETRAIN_JOB_ID \ --specs " $PRUNE_SCHEMA " \ --base-experiment-ids '["' $BASE_EXPERIMENT_ID '"]' \ --encryption-key "nvidia_tlt" | jq -r '.id' ) TAO Launcher tao model ocdnet export -e $SPECS_DIR /export.yaml \ model.load_pruned_graph = true \ model.pruned_graph_path = $RESULTS_DIR /prune/pruned_ $ch_sparsity .pth export.checkpoint = $RESULTS_DIR /train/model_best.pth \ export.onnx_file = $RESULTS_DIR /export/model_best.onnx

Exporting the Model# The export parameter defines the hyperparameters of the export process. model: load_pruned_graph: False pruned_graph_path: '/results/prune/pruned_0.1.pth' backbone: deformable_resnet18 export: results_dir: /results/export checkpoint: '/results/train/model_best.pth' onnx_file: '/results/export/model_best.onnx' width: 1280 height: 736 dataset: validate_dataset: data_path: [ '/data/ocdnet/test' ] Parameter Datatype Default Description Supported Values checkpoint string The path to PyTorch model to export Unix path onnx_file string The path to ONNX file Unix path opset_version unsigned int 11 The opset version of the exported ONNX >0 input_width unsigned int 1280 The input width >0 input_height unsigned int 736 The input height >0 TAO Client (v2 API) EXPORT_JOB_ID = $( tao ocdnet create-job \ --kind experiment \ --name "ocdnet_export" \ --action export \ --workspace-id $WORKSPACE_ID \ --parent-job-id $RETRAIN_JOB_ID \ --specs " $PRUNE_SCHEMA " \ --base-experiment-ids '["' $BASE_EXPERIMENT_ID '"]' \ --encryption-key "nvidia_tlt" | jq -r '.id' ) TAO Launcher tao ocdnet export -e $SPECS_DIR /export.yaml export.checkpoint = <path_to_pth_file> export.onnx_file = <path_to_onnx_file> [ export.<export_option> = <export_option_value> ] Required Arguments The following arguments are required. -e, --experiment_spec : The experiment spec file to set up export

export.checkpoint : The .pth model to export.

export.onnx_file : The path to save the exported model to Optional Arguments The following arguments are optional to run the command. export.<export_option> : The export options.

TensorRT Engine Generation, Validation, and INT8 Calibration# For deployment, see TAO Deploy documentation. Note If you are not running OCDNet TensorRT engine with tao deploy , in other words, if there is no output when you run nm -gDC /usr/lib/x86_64-linux-gnu/libnvinfer_plugin.so |grep ModulatedDeformableConvPlugin in x86 platform or nm -gDC /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so |grep ModulatedDeformableConvPlugin in Jetson platform, you need to compile/replace the TensorRT OSS plugin since OCDNet requires the modulatedDeformConvPlugin. Get the TensorRT repository: git clone -b release/8.6 https://github.com/NVIDIA/TensorRT.git cd TensorRT git submodule update --init --recursive Compile the TensorRT libnvinfer_plugin.so file: mkdir build && cd build # On X86 platform cmake .. # On Jetson platform cmake .. -DTRT_LIB_DIR = /usr/lib/aarch64-linux-gnu/ make nvinfer_plugin -j12 The libnvinfer_plugin.so.8.6.x is generated under the build folder. Note that x depends on the actual minor version. Replace the default plugin library. Note that the exact plugin name depends on the TensorRT version installed in your system. # On X86 platform, for example, if the default plugin is /usr/lib/x86_64-linux-gnu/libnvinfer_plugin.so.8.5.2, then cp libnvinfer_plugin.so.8.6.x /usr/lib/x86_64-linux-gnu/libnvinfer_plugin.so.8.5.2 # On Jetson platform, for example, if the default plugin is /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.8.5.2, then cp libnvinfer_plugin.so.8.6.x /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.8.5.2