TAO Toolkit v5.3.0
NVIDIA TAO v5.3.0

CenterPose

CenterPose is a category-level object pose estimation model included in the TAO Toolkit. It supports the following tasks:

  • train

  • evaluate

  • inference

  • export

These tasks can be invoked from the TAO Toolkit Launcher using the following convention on the command-line:

Copy
Copied!
            

tao model centerpose <sub_task> <args_per_subtask>

where, args_per_subtask are the command-line arguments required for a given subtask. Each subtask is explained in detail in the following sections.

CenterPose expects directories of images and annotated JSON files for training or validation. See the CenterPose Data Format page for more information about the input data format.

The training experiment spec file for CenterPose includes model, train, and dataset parameters. Here is an example spec file for training a CenterPose model with a fan_small backbone on a Google Objectron dataset bike category.

Copy
Copied!
            

dataset: train_data: /path/to/category/train/ val_data: /path/to/category/val/ num_classes: 1 batch_size: 64 workers: 4 category: bike num_symmetry: 1 max_objs: 10 train: num_gpus: 1 validation_interval: 20 checkpoint_interval: 20 num_epochs: 140 clip_grad_val: 100.0 randomseed: 317 resume_training_checkpoint_path: null precision: "fp32" optim: lr: 6e-05 lr_steps: [90, 120] model: down_ratio: 4 use_pretrained: True backbone: model_type: fan_small pretrained_backbone_path: /path/to/your-fan-small-pretrained-model

Parameter Data Type Default Description Supported Values
model dict config The configuration of the model architecture
train dict config The configuration of the training task
dataset dict config The configuration of the dataset
evaluate dict config The configuration of the evaluation task
inference dict config The configuration of the inference task
export dict config The configuration of the ONNX export task
gen_trt_engine dict config The configuration of the TensorRT generation task. Only used in tao deploy
encryption_key string None The encryption key to encrypt and decrypt model files
results_dir string None The directory where experiment results are saved

model

The model parameter provides options to change the CenterPose architecture.

Copy
Copied!
            

model: down_ratio: 4 use_pretrained: False backbone: model_type: fan_small pretrained_backbone_path: /path/to/your-fan-small-pretrained-model

Parameter Datatype Default Description Supported Values
down_ratio int 4 The down scale ratio of the network feature map. 4
use_pretrained bool False A flag specifying whether to initial the backbone with the pretrained weights. True, False
backbone dict config The config for the backbone model type and the path of the pretrained weights. >0

backbone

The backbone parameter provides options to change the CenterPose backbone architecture.

Copy
Copied!
            

backbone: model_type: fan_small pretrained_backbone_path: /path/to/your-fan-small-pretrained-model

Parameter Datatype Default Description Supported Values

pretrained_backbone_path

string

None

The optional path to the pretrained backbone file. Set the pretrained path when using “FAN” backbone.
The “DLA34” backbone can download the pretrained weight automatically, set it to “null”.

string to the path

model_type

string

DLA34

The backbone name of the model. DLA34 and FAN are supported.

DLA34, fan_small,
fan_base, fan_large

train

The train parameter defines the hyperparameters of the training process.

Copy
Copied!
            

train: num_gpus: 1 validation_interval: 20 checkpoint_interval: 20 num_epochs: 140 clip_grad_val: 100.0 randomseed: 317 resume_training_checkpoint_path: null precision: "fp32" optim: lr: 6e-05 lr_steps: [90, 120]

Parameter Datatype Default Description Supported Values
num_gpus unsigned int 1 The number of GPUs to use >0

validation_interval

unsigned int

20

The epoch interval at which the validation is run.
If your customize dataset is lacking of the correct calibration information, you could skip the validation pipeline by setting it to 140

>0

checkpoint_interval unsigned int 20 The interval at which the checkpoints are saved >0
num_epochs unsigned int 140 The total number of epochs to run the experiment >0
clip_grad_val float 100.0 Clips gradient of an iterable of parameters at specified value >=0
randomseed int 317 Obtain the identical results by setting the randomseed to the same value >0
resume_training_checkpoint_path string The intermediate PyTorch Lightning checkpoint to resume training from
precision string fp32 Specifying “fp16” enables precision training. Training with fp16 can help save GPU memory. fp32, fp16
optim dict config The config for the optimizer, including the learning rate, learning scheduler >0

optim

The optim parameter defines the config for the optimizer in training, including the learning rate and learning rate steps.

Copy
Copied!
            

optim: lr: 6e-05 lr_steps: [90, 120]

Parameter

Datatype

Default

Description

Supported Values

lr float 6e-05 The initial learning rate for training the model, excluding the backbone >0.0
lr_steps int list [90, 120] The steps to decrease the learning rate for the scheduler int list

dataset

The dataset parameter defines the dataset source, training batch size, and dataset settings.

Copy
Copied!
            

dataset: train_data: /path/to/category/train/ val_data: /path/to/category/val/ num_classes: 1 batch_size: 64 workers: 4 category: bike num_symmetry: 1 max_objs: 10

Parameter Datatype Default Description Supported Values

train_data

string

The path of training data:
The directory that contains the training images and its related JSON file
They are using the same file name for the image and JSON file in the same folder

val_data

string

The path of validation data:
The directory that contains the validation images and its related JSON file
They are using the same file name for the image and JSON file in the same folder

test_data

string

The path of test data:
The directory that contains the testing images and its related JSON file
They are using the same file name for the image and JSON file in the same folder

inference data

string

The path of inference data:
The directory that contains the inference images
No need the JSON file for the inference pipeline

num_classes unsigned int 1 The number of category in the training data. Because CenterPose is a category-level pose estimation method, it only supported 1 class. 1
batch_size unsigned int 4 The batch size for training and validation >0
workers unsigned int 8 The number of parallel workers processing data >0

category

string

The category name of the training dataset
Different categories may have different training strategies. Please see num_symmetry for more details

num_symmetry

unsigned int

1

The number of symmetric rotations, which means the rotation times for the 3D bounding box along with the y-axis
Each rotated bounding box is treated as a ground truth for the training
For example, bottle is symmetric object and the num_symmetry can be set to 12 (30 degree for each rotation)
The num_symmetry sets to 1 when the object is non-symmetric

>0

max_objs unsigned int 10 The maximum number of objects in the single image that used for training. >0

To train a CenterPose model, use this command:

Copy
Copied!
            

tao model centerpose train [-h] -e <experiment_spec> [-r <results_dir>] [-k <key>]

Required Arguments

  • -e, --experiment_spec: The experiment specification file to set up the training experiment

Optional Arguments

  • -r, --results_dir: The path to the folder where the experiment outputs should be written. If this argument is not specified, the results_dir from the spec file is used.

  • -k, --key: A user-specific encoding key to save or load a .tlt model. If this argument is not specified, the model checkpoint isn’t encrypted.

  • --gpus: The number of GPUs used to run training.

  • -h, --help: Show this help message and exit.

Sample Usage

Here’s an example of the train command:

Copy
Copied!
            

tao centerpose model train -e /path/to/spec.yaml


Optimizing Resource for Training CenterPose

Training CenterPose requires GPUs (for example, V100/A100) and CPU memory to be trained on a standard dataset, such as Objectron. The following are some of the strategies you can use to launch training with only limited resources.

Optimize GPU Memory

There are various ways to optimize GPU memory usage. One trick is to reduce dataset.batch_size, which can cause your training to take longer than usual.

Typically, the following options result in a more balanced performance optimization:

  • Set train.precision to fp16 to enable automatic mixed precision training. This can reduce your GPU memory usage and speed up the training. But might affect the accuracy.

  • Try using more lightweight backbones like DLA34.

evaluate

The evaluate parameter defines the hyperparameters of the evaluate process.

Copy
Copied!
            

evaluate: num_gpus: 1 checkpoint: /path/to/model.pth opencv: False eval_num_symmetry: 1 results_dir: /path/to/saving/directory

Parameter Datatype Default Description Supported Values
num_gpus unsigned int 1 The number of GPUs to use >0
checkpoint string Path to PyTorch model to evaluate

opencv

bool

False

If opencv sets to False, the returned 3D keypoints are in OpenGL camera coordinate
If opencv sets to True, the returned 3D keypoints are in OpenCV camera coordinate
In Objectron Dataset, the defaule 3D keypoints are in OpenGL camera coordinate.

True, False

eval_num_symmetry

unsigned int

1

For symmetric object categories (e.g. bottle), we rotate the estimated bounding box along the symmetry axis N times (N = 100) and evaluate the prediction w.r.t. each rotated instance
For non-symmetric object category, it sets to 1 as the defaule value
The reported number is the instance that maximizes 3D IoU

>0

results_dir string Path to the saved evaluation report. Please make sure the calibration information is correct before running the evaluation
trt_engine string Path to TensorRT model to evaluate. Should be only used with tao deploy

To run evaluation with a CenterPose model, use this command:

Copy
Copied!
            

tao model centerpose evaluate [-h] -e <experiment_spec> [-r <results_dir>] [-k <key>] evaluate.checkpoint=<model to be evaluated>


Required Arguments

  • -e, --experiment_spec: The experiment spec file to set up the evaluation experiment

Optional Arguments

  • -k, --key: A user-specific encoding key to save or load a .tlt model. If this value is not specified, a .pth model must be used.

  • -r, --results_dir: The directory where the evaluation result is stored.

  • evaluate.checkpoint: The .tlt or .pth model to be evaluated.

Sample Usage

The following is an example of using the evaluate command:

Copy
Copied!
            

tao model centerpose evaluate -e /path/to/spec.yaml -r /path/to/results/ evaluate.checkpoint=/path/to/model.pth


inference

The inference parameter defines the hyperparameters of the inference process.

Copy
Copied!
            

inference: checkpoint: /path/to/model.pth visualization_threshold: 0.3 principle_point_x: 300.7 principle_point_y: 392.8 focal_length_x: 615.0 focal_length_y: 615.0 skew: 0.0 use_pnp: True save_json: True save_visualization: True opencv: True

Parameter Datatype Default Description Supported Values
checkpoint string Path to PyTorch model to inference
visualization_threshold float 0.3 Confidence threshold to filter predictions >=0
principle_point_x float 300.7 The principle point x of the intrinsic matrix. Please use the correct camera calibration matrix along with your data >0
principle_point_y float 392.8 The principle point y of the intrinsic matrix. Please use the correct camera calibration matrix along with your data >0
focal_length_x float 615.0 The focal length x of the intrinsic matrix. Please use the correct camera calibration matrix along with your data >0
focal_length_y float 615.0 The focal length y of the intrinsic matrix.Please use the correct camera calibration matrix along with your data >0
skew float 0.0 The skew of the intrinsic matrix. Please use the correct camera calibration matrix along with your data >=0
use_pnp bool True The PnP algorithm that used to establish 2D-3D correspondences for solving the 6-DoF pose True, False
save_json bool True Save all the results to local JSON file, including 2d keypoints, 3D keypoints, location, quaternion and relative scale True, False

save_visualization

bool

True

Save the visualization results to local .jpg file, including projected 2d bounding box along with the point order, relative scale and object pose
The +y is up (aligned with the gravity, green line); The +x follows right hand rule (red line); The +z is the front face (blue line)

True, False

opencv

bool

False

If opencv sets to False, the returned 3D keypoints are in OpenGL camera coordinate
If opencv sets to True, the returned 3D keypoints are in OpenCV camera coordinate
In Objectron Dataset, the defaule 3D keypoints are in OpenGL camera coordinate.

True, False

trt_engine string Path to TensorRT model to inference. Should be only used with tao deploy

The inference tool for CenterPose models can be used to visualize 3D bounding boxes in 2D image plane, the order of points and the object relative dimension. Furthermore, it also generates a frame-by-frame JSON file for recording the results for each image.

Copy
Copied!
            

tao model centerpose inference [-h] -e <experiment spec file> [-r <results_dir>] [-k <key>] inference.checkpoint=<model to be inferenced>


Required Arguments

  • -e, --experiment_spec: The experiment spec file to set up the inference experiment

Optional Arguments

  • -k, --key: A user-specific encoding key to save or load a .tlt model. If this value is not specified, a .pth model must be used.

  • -r, --results_dir: The directory where the inference result is stored.

  • inference.checkpoint: The .tlt or .pth model to inference.

Sample Usage

The following is an example of using the inference command:

Copy
Copied!
            

tao model centerpose inference -e /path/to/spec.yaml -r /path/to/results/ inference.checkpoint=/path/to/model.pth


export

The export parameter defines the hyperparameters of the export process.

Copy
Copied!
            

export: gpu_id: 0 checkpoint: /path/to/model.pth onnx_file: /path/to/model.onnx input_channel: 3 input_width: 512 input_height: 512 opset_version: 16 do_constant_folding: True

Parameter Datatype Default Description Supported Values
gpu_id unsigned int 0 The gpu id for converting the pth model to ONNX model >=0
checkpoint string The path to the PyTorch model to export
onnx_file string The path to the .onnx file
input_channel unsigned int 3 The input channel size. Only the value 3 is supported. 3
input_width unsigned int 512 The input width >0
input_height unsigned int 512 The input height >0
opset_version unsigned int 16 The opset version of the exported ONNX >0
do_constant_folding bool True Whether to execute constant folding. If the TensorRT version lower than 8.6, it sets to True True, False
Copy
Copied!
            

tao model centerpose export [-h] -e <experiment spec file> [-r <results_dir>] [-k <key>] export.checkpoint=<model to export> export.onnx_file=<onnx path>


Required Arguments

  • -e, --experiment_spec: The path to an experiment spec file

Optional Arguments

  • -k, --key: A user-specific encoding key to save or load a .tlt model. If this value is not specified, a .pth model must be used.

  • -r, --results_dir: The directory where the inference result is stored.

  • export.checkpoint: The .tlt or .pth model to export.

  • export.onnx_file: The path where the .etlt or .onnx model is saved.

Sample Usage

The following is an example of using the export command:

Copy
Copied!
            

tao model centerpose export -e /path/to/spec.yaml export.checkpoint=/path/to/model.pth export.onnx_file=/path/to/model.onnx


Previous CenterPose
Next Foundation Models
© Copyright 2023, NVIDIA.. Last updated on Aug 26, 2024.