OCDNet - NVIDIA Docs

OCDNet is an optical-character detection model that is included in the TAO Toolkit. It supports the following tasks:

train
evaluate
inference
prune
export

These tasks can be invoked from the TAO Toolkit Launcher using the following convention on the command-line:

Copy
Copied!

            
            tao model ocdnet <sub_task> <args_per_subtask>

Where args_per_subtask are the command-line arguments required for a given subtask. Each subtask is explained in detail in the following sections.

Preparing the Dataset

The dataset for OCDNet contains images and the corresponding label files.

Both the training dataset and test dataset must follow the same structure. The directory structure should be organized as follows, where the directory name for images is img and the directory name for label files is gt. By default, the label file is expected to use gt_ as a prefix for comparison to the corresponding image file.

The exact directory names train and test are not required but are preferred by convention.

Copy
Copied!

            
            /train
  /img
    img_0.jpg
    img_1.jpg
    ...
  /gt
    gt_img_0.txt
    gt_img_1.txt
    ...
/test
  /img
    img_0.jpg
    img_1.jpg
    ...
  /gt
    gt_img_0.txt
    gt_img_1.txt
    ...

Below is an example label file from the public ICDAR2015 dataset:

Copy
Copied!

            
            $ cat ICDAR2015/test/gt/gt_img_14.txt
  268,82,335,93,332,164,267,164,the
  344,94,433,112,427,159,336,163,Future
  208,191,374,184,371,213,208,241,Communications
  370,176,420,176,416,204,373,213,###
  1,57,261,76,261,187,0,190,venting
  1,208,203,200,203,241,3,294,ntelligence.

Note

The label file contains the cooridnates for all the points. The last one is the text. If the text is ### and the training spec file sets ignore_tags to ['###'], then those lines are ignored during training.

Creating an Experiment Spec File

The spec file for OCDNet includes model, train, dataset, and evaluate, as well as other global parameters. Below is an example spec file for training an OCDNet model with a FAN-tiny backbone on an ICDAR2015 dataset.

Copy
Copied!

            
            num_gpus: 1
model:
  load_pruned_graph: False
  pruned_graph_path: '/results/prune/pruned_0.1.pth'
  pretrained_model_path: '/data/ocdnet/ocdnet_fan_tiny_2x_icdar.pth'
  backbone: fan_tiny_8_p4_hybrid
  enlarge_feature_map_size: True
  activation_checkpoint: True
train:
  results_dir: /results/train
  num_epochs: 80
  #resume_training_checkpoint_path: '/results/train/resume.pth'
  checkpoint_interval: 1
  validation_interval: 1
  is_dry_run: False
  precision: fp32
  model_ema: False
  model_ema_decay: 0.999
  trainer:
    clip_grad_norm: 5.0
  optimizer:
    type: Adam
    args:
      lr: 0.001
  lr_scheduler:
    type: WarmupPolyLR
    args:
      warmup_epoch: 3
  post_processing:
    type: SegDetectorRepresenter
    args:
      thresh: 0.3
      box_thresh: 0.55
      max_candidates: 1000
      unclip_ratio: 1.5
  metric:
    type: QuadMetric
    args:
      is_output_polygon: false
dataset:
  train_dataset:
    data_name: ICDAR2015Dataset
    data_path: ['/data/ocdnet_vit/train']
    args:
      pre_processes:
        - type: IaaAugment
          args:
            - {'type':Fliplr, 'args':{'p':0.5}}
            - {'type': Affine, 'args':{'rotate':[-45,45]}}
            - {'type':Sometimes,'args':{'p':0.2, 'then_list':{'type': GaussianBlur, 'args':{'sigma':[1.5,2.5]}}}}
            - {'type':Resize,'args':{'size':[0.5,3]}}
        - type: EastRandomCropData
          args:
            size: [640,640]
            max_tries: 50
            keep_ratio: true
        - type: MakeBorderMap
          args:
            shrink_ratio: 0.4
            thresh_min: 0.3
            thresh_max: 0.7
        - type: MakeShrinkMap
          args:
            shrink_ratio: 0.4
            min_text_size: 8

      img_mode: BGR
      filter_keys: [img_path,img_name,text_polys,texts,ignore_tags,shape]
      ignore_tags: ['*', '###']
    loader:
      batch_size: 1
      pin_memory: true
      num_workers: 12
  validate_dataset:
    data_name: ICDAR2015Dataset
    data_path: ['/data/ocdnet_vit/test']
    args:
      pre_processes:
        - type: Resize2D
          args:
            short_size:
              - 1280
              - 736
            resize_text_polys: true
      img_mode: BGR
      filter_keys: []
      ignore_tags: ['*', '###']
    loader:
      batch_size: 1
      pin_memory: false
      num_workers: 1

The top level description of the spec file is provided in the table below.

Parameter	Data Type	Default	Description	Supported Values
`num_gpus`	unsigned int	1	The number of GPUs	>0

Model

The model parameter provides the list of parameters for the model.

Parameter	Data Type	Default	Description	Supported Values
`load_pruned_graph`	bool	`false`	A flag specifying whether to load the pruned graph. Set to True if train/evaluate/export/inference is being performed against a pruned model.	true/false
`pruned_graph_path`	string	–	The path to the pruned graph model (if `load_pruned_graph` is True)	unix path
`pretrained_model_path`	string	–	The path to the pretrained model	unix path
`backbone`	string	deformable_resnet18	The backbone of the model	deformable_resnet18 deformable_resnet50 fan_tiny_8_p4_hybrid
`enlarge_feature_map_size`	bool	`false`	A flag specifying whether to enlarge the output feature map size of the FAN-tiny backbone. This flag has no effect when using a `deformable_resnet` backbone.	true/false
`activation_checkpoint`	bool	`false`	A flag specifying whether to use activation checkpoints to save GPU memory. This flag has no effect when using a `deformable_resnet` backbone.	true/false true/false

Train

The train parameter provides the parameters for training.

Parameter	Data Type	Default	Description	Supported Values
`results_dir`	string	–	The directory for saving training result	unix path
`num_epochs`	unsigned int	50	The total number of epochs to run the experiment	>0
`checkpoint_interval`	unsigned int	1	The interval at which to save the checkpoint file	>0
`validation_interval`	unsigned int	1	The interval of validation	>0
`optimizer`	dict config	–	The configuration for the optimizer	–
`lr_scheduler`	dict config	–	The configuration for the lr_scheduler	–
`post_processing`	dict config	–	The configuration for post_processing.	–
`metric`	dict config	–	The configuration for metric computing. QuadMetric is supported. If `is_output_polygon` is True, a polygon will be generated. If it is False, a BBox will be generated.	–
`is_dry_run`	bool	`false`	If this flag is True, only one batch will run. This flag is only recommended for debugging purposes.	true/false
`precision`	string	fp32	The precision that the model will be trained on. If this value is set to ‘fp16’, AMP training will be enabled	fp32/fp16
`model_ema`	bool	`false`	A flag to enable model EMA. The default value is False. If the value is True, model EMA will be enabled during training	true/false
`model_ema_decay`	float	0.999	The decay of model EMA. The default value is 0.999. This value is only used when `model_ema` is set to True.	(0, 1]

optimizer

Copy
Copied!

            
            optimizer:
  type: Adam
  args:
    lr: 0.001

Parameter	Data Type	Default	Description	Supported Values
`type`	string	Adam	The optimizer type	Adam
`lr`	float	–	The initial learning rate	>=0.0

lr_scheduler

Copy
Copied!

            
            lr_scheduler:
  type: WarmupPolyLR
  args:
    warmup_epoch: 3

Parameter	Data Type	Default	Description	Supported Values
`type`	string	WarmupPolyLR	Decays the learning rate via a polynomial function. The learning rate increases to initial value during warmup stage and is reduced from the initial value to zero during the training stage.	WarmupPolyLR
`warmup_epoch`	unsigned int	3	The warmup epoch, which the learning rate increases to the intitial value (i.e. `optimizer.args.lr`). The warmup epoch should not be the same as the `num_epochs`.	>=0

post_processing

Copy
Copied!

            
            post_processing:
  type: SegDetectorRepresenter
  args:
    thresh: 0.3
    box_thresh: 0.55
    max_candidates: 1000
    unclip_ratio: 1.5

Parameter	Data Type	Default	Description	Supported Values
`type`	string	SegDetectorRepresenter	The name of the post_processing. The post_processing will generate BBox or polygon.	SegDetectorRepresenter
`thresh`	float	0.3	The threshold for binarization, which is used in generating an approximate binary map.	0.0 ~ 1.0
`box_thresh`	float	0.7	The BBox threshold. If the effective area is lower than this threshold, the prediction will be ignored, which means no text is detected.	0.0 ~ 1.0
`max_candidates`	unsigned int	1000	The maximum candidate output. Enlarge this parameter if characters are detected in one area but obviously not in the other area of the image.	> 1
`unclip_ratio`	float	1.5	The unclip ratio using the Vatti clipping algorithm in the probability map. The BBox will look larger if this ratio is set larger.	>0.0

Dataset

The dataset is defined by two sections: train_dataset and validate_dataset

Parameter	Data Type	Default	Description	Supported Values
`train_dataset`	dict config	–	The configuragtion for the training dataset	–
`validate_dataset`	dict config	–	The configuragtion for the validation dataset	–

The parameters for train_dataset is provided below.

Parameter	Data Type	Default	Description	Supported Values
`data_name`	string	ICDAR2015Dataset	The dataset name. For “ICDAR2015Dataset”, the label file is expected to use `gt_` as a prefix. For “UberDataset”, the label file is expected to use `truth_` as a prefix.	ICDAR2015Dataset UberDataset
`data_path`	string list	–	The list of paths that contain images used for training: For example, `['path_1']` or `['path_1', 'path_2', ...]`	–
`pre_processes`	dict	–	The pre-processing configuration (see ) train_preprocess for more details	–
`img_mode`	string	BGR	The image mode	BGR, RGB, GRAY
`filter_keys`	string list	`['img_path', 'img_name', 'text_polys', 'texts', 'ignore_tags', 'shape']`	The keys to ignore	–
`ignore_tags`	string list	`['*', '###']`	The labels that are not used to train	–
`batch_size`	unsigned int	False	The batch size. Set to a lower value if you encounter out-of-memory errors.	>0
`pin_memory`	bool	False	A flag specifying whether to enable pinned memory	true/false
`num_workers`	unsigned int	1	The threds used to load data	>=0

train_preprocess

Copy
Copied!

            
            pre_processes:
  - type: IaaAugment
    args:
      - {'type':Fliplr, 'args':{'p':0.5}}
      - {'type': Affine, 'args':{'rotate':[-45,45]}}
      - {'type':Sometimes,'args':{'p':0.2, 'then_list':{'type': GaussianBlur, 'args':{'sigma':[1.5,2.5]}}}}
      - {'type':Resize,'args':{'size':[0.5,3]}}
  - type: EastRandomCropData
    args:
      size: [640,640]
      max_tries: 50
      keep_ratio: true
  - type: MakeBorderMap
    args:
      shrink_ratio: 0.4
      thresh_min: 0.3
      thresh_max: 0.7
  - type: MakeShrinkMap
    args:
      shrink_ratio: 0.4
      min_text_size: 8

Parameter	Data Type	Default	Description	Supported Values
`IaaAugment`	dict list	`{'type':Fliplr, 'args':{'p':0.5}}` `{'type': Affine, 'args':{'rotate':[-10,10]}}` `{'type':Sometimes,'args':{'p':1.0, 'then_list':{'type': GaussianBlur, 'args':{'sigma':[1.5,2.5]}}}}` `{'type':Resize,'args':{'size':[0.5,3]}}`	Uses imgaug to perform augmentation. “Fliplr”, “Affine”, “Sometimes”, “GaussianBlur” and “Resize” are used by default. `p` defines the probability of each image to be flipped. `rotate` defines the degree range when rotating images by a random value. `Sometimes` defines only `p` percent of all images with one or more augmenters. `then_list` defines the Augmenter(s) to apply to `p` percent of all images `GaussianBlur` defines the blur using gaussian kernels. `sigma` defines the standard deviation of the gaussian kernel. `size` defines the range when resizing each image compared to its original size.	`p`: 0.0 ~ 1.0 `roate`: -180 ~ 180 `sigma`: -180 ~ 180 `resize`: >0.0 ~ >0.0
`EastRandomCropData`	dict config	– –	The ramdom crop after augmentation. `size` defines the cropped target size(width,height). The width and height should be multiples of 32. `max_tries` defines the maximum times to try to crop since the cropped area may be too small or cropping may have failed. `keep_ratio` specifies whether to keep the aspect ratio.	`size`: [>0, >0] `max_tries`: >0 `keep_ratio`: true/false
`MakeBorderMap`	dict config	–	Defines the parameter when generating a threshold map. `shrink_ratio` is used to calculate the distance between expanding/shrinking polygons and the original text polygon. `thresh_min` and `thresh_max` will set the threshold range when generating the threshold map.	0.0 ~ 1.0
`MakeShrinkMap`	dict config	–	Defines the parameter when generating a probability map. `shrink_ratio` is used to generate shrunken polygons. `min_text_size` specifies that the text will be ignored if its height or width is lower than this parameter.	0.0 ~ 1.0

The parameters for validate_dataset are similar to train_dataset, except below validation_preprocess.

validation_preprocess

Copy
Copied!

            
            pre_processes:
  - type: Resize2D
    args:
      short_size:
        - 1280
        - 736
      resize_text_polys: true

Parameter	Data Type	Default	Description	Supported Values
`type`	string	Resize2D	Resize the images and labels before evaluation.	Resize2D
`short_size`	list	–	Resize the image to (width x height).	>0, >0, and multiples of 32.
`resize_text_polys`	bool	–	A flag specifying whether to resize the text coordinate	true/false

Evaluate

The following is an example spec file for evaluating on the ICDAR2015 dataset.

Copy
Copied!

            
            model:
  load_pruned_graph: False
  pruned_graph_path: '/results/prune/pruned_0.1.pth'
  backbone: deformable_resnet18
evaluate:
  results_dir: /results/evaluate
  checkpoint: /results/train/model_best.pth
  gpu_id: 0
  post_processing:
    type: SegDetectorRepresenter
    args:
      box_thresh: 0.55
      max_candidates: 1000
      unclip_ratio: 1.5
  metric:
    type: QuadMetric
    args:
      is_output_polygon: false
dataset:
  validate_dataset:
    data_path: ['/data/ocdnet/test']
    args:
      pre_processes:
        - type: Resize2D
          args:
            short_size:
              - 1280
              - 736
            resize_text_polys: true
      img_mode: BGR
      filter_keys: []
      ignore_tags: ['*', '###']
    loader:
      batch_size: 1
      shuffle: false
      pin_memory: false
      num_workers: 4

Inference

The following is an example spec file for running infernce:

Copy
Copied!

            
            model:
  load_pruned_graph: false
  pruned_graph_path: '/results/prune/pruned_0.1.pth'
  backbone: deformable_resnet18
inference:
  checkpoint: '/results/train/model_best.pth'
  input_folder: /data/ocdnet/test/img
  width: 1280
  height: 736
  img_mode: BGR
  polygon: false
  results_dir: /results/inference
  post_processing:
    type: SegDetectorRepresenter
    args:
      thresh: 0.3
      box_thresh: 0.55
      max_candidates: 1000
      unclip_ratio: 1.5

The inference parameter defines the hyper-parameters of the inference process. Inference will draw bounding boxes or polygons and visualize it in images.

Parameter	Datatype	Default	Description	Supported Values
`checkpoint`	string	–	The path to the pth model	Unix path
`input_folder`	string	–	The path to the input folder for inference	Unix path
`width`	unsigned int	–	The input width	>=1
`height`	unsigned int	–	The input height	>=1
`img_mode`	string	–	The image mode	BGR/RGB/GRAY
`polygon`	bool	–	A True value specifies BBox, while a False value specifies polygon.	true, false

Training the Model

Use the following command to run OCDnet training:

Copy
Copied!

            
            tao model ocdnet train -e <experiment_spec_file>
                       -r <results_dir>
                       [model.pretrained_model_path=<path_to_pretrained_model_file>]
                       [train.resume_training_checkpoint_path=<path_to_resume_training_checkpoint>]
                       [num_gpus=<num_gpus>]

Required Arguments

-e, --experiment_spec_file: The path to the experiment spec file

Optional Arguments

-r, --results_dir: The path to the folder where the experiment outputs should be written
num_gpus: The number of GPUs to be used in the training in a multi-GPU scenario. The default value is 1.

Here’s an example of running train with a pretrained model:

Copy
Copied!

            
            tao model ocdnet train \
      -e $SPECS_DIR/train.yaml \
      -r $RESULTS_DIR/train \
      model.pretrained_model_path=$DATA_DIR/ocdnet_deformable_resnet18.pth

Here’s an example of resuming training:

Copy
Copied!

            
            tao model ocdnet train \
       -e $SPECS_DIR/train.yaml \
       -r $RESULTS_DIR/train \
       train.resume_training_checkpoint_path=$RESULTS_DIR/train/resume.pth

Here’s an example of running train with multi-gpus:

Copy
Copied!

            
            tao model ocdnet train \
      -e $SPECS_DIR/train.yaml \
      -r $RESULTS_DIR/train \
      model.pretrained_model_path=$DATA_DIR/ocdnet_deformable_resnet18.pth \
      num_gpus=2

Note

By default, the training is using DDP(distributed data parallel) strategy. When train with multi-gpus, only if evaluation images are multiple of num_gpus * evaluate_batch_size, the hmean result during training will be the same as the hmean result of running tao model ocdnet evaluate.

Evaluating the Model

Use the following command to run OCDNet evaluation:

Copy
Copied!

            
            tao model ocdnet evaluate  -e <experiment_spec_file>
                          [evaluate.checkpoint=<path_to_checkpoint>]

Required Arguments

-e, --experiment_spec_file: The experiment spec file to set up the evaluation experiment

Optional Arguments

-r, --results_dir: The path to a folder where the experiment outputs should be written
-h, --help: Show this help message and exit.

Here’s an example of using the OCDNet evaluation command:

Copy
Copied!

            
            tao model ocdnet evaluate \
        -e $SPECS_DIR/evaluate.yaml \
        evaluate.checkpoint=$RESULTS_DIR/train/model_best.pth

Running Inference on the OCDNet Model

Copy
Copied!

            
            tao ocdnet inference -e <experiment_spec_file>
                     -r <results_dir>

Required Arguments

-e, --experiment_spec_file: The experiment spec file to set up the inference experiment.

Optional Arguments

-r, --results_dir: The path to a folder where the experiment outputs should be written

Here’s an example of using the OCDNet inference command:

Copy
Copied!

            
            tao model ocdnet inference \
     -e $SPECS_DIR/inference.yaml \
     inference.checkpoint=$RESULTS_DIR/train/model_best.pth \
     inference.input_folder=$DATA_DIR/test/img \
     inference.results_dir=$RESULTS_DIR/infer

Note

Currently, inference expects existing label files in the gt folder. If there are not any label files, please generate dummy labels under the gt folder. Use the below script for reference:

Copy
Copied!

            
            #!/bin/bash
folder_path=/workspace/datasets/ICDAR2015/datasets/test
mkdir -p ${folder_path}/gt
for filename in `ls ${folder_path}/img`; do
    touch "${folder_path}/gt/gt_${filename%.*}.txt"
    echo "10,10,10,20,20,10,20,20,###" > "${folder_path}/gt/gt_${filename%.*}.txt"
done

Pruning and Retraining an OCDNet Model

Model pruning reduces model parameters to improve inference frames per second (FPS) while maintaining nearly the same hmean.

Pruning is applied to an already trained OCDNet model. After pruning, the pruned graph model is generated. It is a new model with fewer parameters. Once you have this pruned graph model, you will need to retrain it on the same dataset to bring back the hmean. During retraining, you need to enable loading this pruned graph model and setting the path to this model.

The prune parameter defines the hyperparameters of the pruning process.

Copy
Copied!

            
            prune:
  checkpoint: /results/train/model_best.pth
  ch_sparsity: 0.2
  round_to: 32
  p: 2
  results_dir: /results/prune
  verbose: True

model:
  backbone: fan_tiny_8_p4_hybrid
  enlarge_feature_map_size: True
  fuse_qkv_proj: False

dataset:
  validate_dataset:
      data_path: ['/data/ocdnet_vit/test']
      args:
        pre_processes:
          - type: Resize2D
            args:
              short_size:
                - 640
                - 640
              resize_text_polys: true
        img_mode: BGR
        filter_keys: []
        ignore_tags: ['*', '###']
      loader:
        batch_size: 1
        shuffle: false
        pin_memory: false
        num_workers: 1

Parameter	Datatype	Default	Description	Supported Values
`checkpoint`	string		The path to PyTorch model to prune	unix path
`ch_sparsity`	float		The pruning threshold	0.0 ~ 1.0
`results_dir`	string		The path to the results directory	unix path
`round_to`	unsigned int		Round channels to the nearest multiple of round_to. E.g., round_to=8 means channels will be rounded to 8x.	>0
`p`	unsigned int		The norm degree to estimate the importance of channels. Default: 2	>0
`verbose`	bool		A flag whether print prune information, default: True	true/false
`fuse_qkv_proj`	bool		A flag whether fuse the qkv projection, default: True, it’s only needed set to True when using fan-tiny backbone.	true/false

Use the following command to run pruning on the OCDNet model.

Copy
Copied!

            
            tao model ocdnet prune -e $SPECS_DIR/prune.yaml \
               prune.checkpoint=$RESULTS_DIR/train/model_best.pth \
               prune.ch_sparsity=0.1 \
               prune.results_dir=$RESULTS_DIR/prune

Required Arguments

-e, --experiment_spec_file: The experiment spec file to set up the pruning experiment.

Optional Arguments

prune.ch_sparsity: The pruning threshold, which should be a float number between 0.0 and 1.0. The default value is 0.1.

After pruning, the pruned model can be used for retraining (i.e. fine tuning). To start the retraining, you need to set the load_pruned_graph parameter to true and set the pruned_graph_path parameter to point to the model that is generated from pruning.

Note

When retraining, evaluating, performing inference on, or exporting a model that has a pruned structure, you need to set load_pruned_graph to true so that the newly pruned model structure is imported. See the examples for more details.

Here’s an example of running training with a pruned model:

Copy
Copied!

            
            tao model ocdnet train -e $SPECS_DIR/train.yaml \
                -r $RESULTS_DIR/retrain \
                model.load_pruned_graph=true \
                model.pruned_graph_path=$RESULTS_DIR/prune/pruned_$ch_sparsity.pth

Here’s an example of resuming training against a pruned model:

Copy
Copied!

            
            tao model ocdnet train \
       -e $SPECS_DIR/train.yaml \
       -r $RESULTS_DIR/retrain \
       model.load_pruned_graph=true \
       model.pruned_graph_path=$RESULTS_DIR/prune/pruned_$ch_sparsity.pth
       train.resume_training_checkpoint_path=$RESULTS_DIR/retrain/resume.pth

Here’s an example of running evalation against a pruned model:

Copy
Copied!

            
            tao model ocdnet evaluate \
       -e $SPECS_DIR/evaluate.yaml \
       -r $RESULTS_DIR/evaluate \
       model.load_pruned_graph=true \
       model.pruned_graph_path=$RESULTS_DIR/prune/pruned_$ch_sparsity.pth
       evaluate.checkpoint==$RESULTS_DIR/train/model_best.pth

Here’s an example of running inference against a pruned model:

Copy
Copied!

            
            tao model ocdnet inference \
       -e $SPECS_DIR/inference.yaml \
       model.load_pruned_graph=true \
       model.pruned_graph_path=$RESULTS_DIR/prune/pruned_$ch_sparsity.pth
       inference.checkpoint=$RESULTS_DIR/train/model_best.pth \
       inference.input_folder=$DATA_DIR/test/img \
       inference.results_dir=$RESULTS_DIR/infer

Here’s an example of running export against a pruned model:

Copy
Copied!

            
            tao model ocdnet export \
       -e $SPECS_DIR/export.yaml \
       model.load_pruned_graph=true \
       model.pruned_graph_path=$RESULTS_DIR/prune/pruned_$ch_sparsity.pth
       export.checkpoint=$RESULTS_DIR/train/model_best.pth \
       export.onnx_file=$RESULTS_DIR/export/model_best.onnx

Exporting the Model

The export parameter defines the hyperparameters of the export process.

Copy
Copied!

            
            model:
  load_pruned_graph: False
  pruned_graph_path: '/results/prune/pruned_0.1.pth'
  backbone: deformable_resnet18
export:
  results_dir: /results/export
  checkpoint: '/results/train/model_best.pth'
  onnx_file: '/results/export/model_best.onnx'
  width: 1280
  height: 736
dataset:
  validate_dataset:
    data_path: ['/data/ocdnet/test']

Parameter	Datatype	Default	Description	Supported Values
`checkpoint`	string		The path to PyTorch model to export	Unix path
`onnx_file`	string		The path to onnx file	Unix path
`opset_version`	unsigned int	11	The opset version of the exported onnx	>0
`input_width`	unsigned int	1280	The input width	>0
`input_height`	unsigned int	736	The input height	>0

Copy
Copied!

            
            tao ocdnet export -e $SPECS_DIR/export.yaml
                  export.checkpoint=<path_to_pth_file>
                  export.onnx_file=<path_to_onnx_file>

Required Arguments

-e, --experiment_spec: The experiment spec file to set up export
export.checkpoint: The path to save the exported model to
export.onnx_file: Show this help message and exit.

Here’s an example for using the OCDNet export command:

Copy
Copied!

            
            tao model ocdnet export \
          -e $SPECS_DIR/export.yaml \
          export.checkpoint=$RESULTS_DIR/train/model_best.pth \
          export.onnx_file=$RESULTS_DIR/export/model_best.onnx

TensorRT Engine Generation, Validation, and INT8 Calibration

For deployment, please refer to the TAO Deploy documentation.

Note

If you are not running OCDNet TensorRT engine with tao deploy, in other words, if there is no output when you run nm -gDC /usr/lib/x86_64-linux-gnu/libnvinfer_plugin.so |grep ModulatedDeformableConvPlugin in x86 platform or nm -gDC /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so |grep ModulatedDeformableConvPlugin in Jetson platform, you need to compile/replace the TensorRT OSS plugin since OCDNet requires the modulatedDeformConvPlugin.

Get the TensorRT repository:

Copy
Copied!

            
            git clone -b release/8.6 https://github.com/NVIDIA/TensorRT.git
cd TensorRT
git submodule update --init --recursive

Compile the TensorRT libnvinfer_plugin.so file:

Copy
Copied!

            
            mkdir build && cd build
# On X86 platform
cmake ..
# On Jetson platform
cmake .. -DTRT_LIB_DIR=/usr/lib/aarch64-linux-gnu/
make nvinfer_plugin -j12

The libnvinfer_plugin.so.8.6.x is generated under the build folder. Note that x depends on the actual minor version.

Replace the default plugin library. Note that the exact plugin name will depend on the TensorRT version installed in your system.

Copy
Copied!

            
            # On X86 platform, for example, if the default plugin is /usr/lib/x86_64-linux-gnu/libnvinfer_plugin.so.8.5.2, then
cp libnvinfer_plugin.so.8.6.x /usr/lib/x86_64-linux-gnu/libnvinfer_plugin.so.8.5.2
# On Jetson platform, for example, if the default plugin is /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.8.5.2, then
cp libnvinfer_plugin.so.8.6.x /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.8.5.2

Deploying to DeepStream

Refer to the nvOCDR page for more information about deploying an OCDNet model to DeepStream. You can run nvOCDR with the DeepStream sample or Triton Inference Server. Specifically, nvOCDR Triton can support inference against high resolution image. In short, it will resize the image while keeping aspect ratio and then tile the image to small patches, and run OCDNet to get the output then merge the result. This is useful to improve hmean in case a model is trained with a smaller resolution but will run inference against higher resolution images. For images which are not high resolution, you can also set resize_keep_aspect_ratio:true, this is useful to improve hmean because the images are resized without distortion.