CenterPose#

CenterPose is a category-level object pose estimation model included in the TAO. It supports the following tasks:

train
evaluate
inference
export

Data Input for CenterPose#

CenterPose expects directories of images and annotated JSON files for training or validation. See the CenterPose Data Format page for more information about the input data format.

Creating an Experiment Specification File#

Parameter	Data Type	Default	Description	Supported Values
`model`	dict config	–	The configuration of the model architecture
`dataset`	dict config	–	The configuration of the dataset
`train`	dict config	–	The configuration of the training task
`evaluate`	dict config	–	The configuration of the evaluation task
`inference`	dict config	–	The configuration of the inference task
`encryption_key`	string	None	The encryption key to encrypt and decrypt model files
`results_dir`	string	/results	The directory where experiment results are saved
`export`	dict config	–	The configuration of the ONNX export task
`gen_trt_engine`	dict config	–	The configuration of the TensorRT generation task

model#

The model parameter provides options to change the CenterPose architecture.

model:
down_ratio: 4
use_pretrained: False
backbone:
  model_type: fan_small
  pretrained_backbone_path: /path/to/your-fan-small-pretrained-model

Parameter	Datatype	Default	Description	Supported Values
`down_ratio`	int	4	The down scale ratio of the network feature map.	4
`use_pretrained`	bool	False	A flag specifying whether to initial the backbone with the pretrained weights.	True, False
`backbone`	dict config		The config for the backbone model type and the path of the pretrained weights.	>0

backbone#

The backbone parameter provides options to change the CenterPose backbone architecture.

backbone:
  model_type: fan_small
  pretrained_backbone_path: /path/to/your-fan-small-pretrained-model

Parameter	Datatype	Default	Description	Supported Values
`pretrained_backbone_path`	string	None	The optional path to the pretrained backbone file. Set the pretrained path when using “FAN” backbone. The “DLA34” backbone can download the pretrained weight automatically, set it to “null”.	string to the path
`model_type`	string	DLA34	The backbone name of the model. DLA34 and FAN are supported.	DLA34, fan_small, fan_base, fan_large

train#

The train parameter defines the hyperparameters of the training process.

train:
  num_gpus: 1
  gpu_ids: [0]
  checkpoint_interval: 5
  validation_interval: 5
  num_epochs: 10
  clip_grad_val: 100.0
  seed: 1234
  resume_training_checkpoint_path: null
  precision: "fp32"
  optim:
    lr: 6e-05
    lr_steps: [90, 120]

Parameter	Datatype	Default	Description	Supported Values
`num_gpus`	unsigned int	1	The number of GPUs to use for distributed training	>0
`gpu_ids`	List[int]	[0]	The indices of the GPU’s to use for distributed training
`seed`	unsigned int	1234	The random seed for random, numpy, and torch	>0
`num_epochs`	unsigned int	10	The total number of epochs to run the experiment	>0
`checkpoint_interval`	unsigned int	1	The epoch interval at which the checkpoints are saved	>0
`validation_interval`	unsigned int	1	The epoch interval at which the validation is run	>0
`resume_training_checkpoint_path`	string		The intermediate PyTorch Lightning checkpoint to resume training from
`results_dir`	string	/results/train	The directory to save training results
`clip_grad_val`	float	100.0	Clips gradient of an iterable of parameters at specified value	>=0
`precision`	string	fp32	Specifying “fp16” enables precision training. Training with fp16 can help save GPU memory.	fp32, fp16
`optim`	dict config		The config for the optimizer, including the learning rate, learning scheduler	>0

optim#

The optim parameter defines the config for the optimizer in training, including the learning rate and learning rate steps.

optim:
  lr: 6e-05
  lr_steps: [90, 120]

Parameter	Datatype	Default	Description	Supported Values
`lr`	float	6e-05	The initial learning rate for training the model, excluding the backbone	>0.0
`lr_steps`	int list	[90, 120]	The steps to decrease the learning rate for the scheduler	int list

dataset#

The dataset parameter defines the dataset source, training batch size, and dataset settings.

dataset:
  train_data: /path/to/category/train/
  val_data: /path/to/category/val/
  num_classes: 1
  batch_size: 64
  workers: 4
  category: bike
  num_symmetry: 1
  max_objs: 10

Parameter	Datatype	Default	Description	Supported Values
`train_data`	string		The path of training data: The directory that contains the training images and its related JSON file They are using the same file name for the image and JSON file in the same folder
`val_data`	string		The path of validation data: The directory that contains the validation images and its related JSON file They are using the same file name for the image and JSON file in the same folder
`test_data`	string		The path of test data: The directory that contains the testing images and its related JSON file They are using the same file name for the image and JSON file in the same folder
`inference data`	string		The path of inference data: The directory that contains the inference images No need the JSON file for the inference pipeline
`num_classes`	unsigned int	1	The number of category in the training data. Because CenterPose is a category-level pose estimation method, it only supported 1 class.	1
`batch_size`	unsigned int	4	The batch size for training and validation	>0
`workers`	unsigned int	8	The number of parallel workers processing data	>0
`category`	string		The category name of the training dataset Different categories may have different training strategies. Please see `num_symmetry` for more details
`num_symmetry`	unsigned int	1	The number of symmetric rotations, which means the rotation times for the 3D bounding box along with the y-axis Each rotated bounding box is treated as a ground truth for the training For example, bottle is symmetric object and the num_symmetry can be set to 12 (30 degree for each rotation) The `num_symmetry` sets to 1 when the object is non-symmetric	>0
`max_objs`	unsigned int	10	The maximum number of objects in the single image that used for training.	>0

Training the Model#

Optimizing Resource for Training CenterPose#

Training CenterPose requires GPUs (for example, V100/A100) and CPU memory to be trained on a standard dataset, such as Objectron. The following are some of the strategies you can use to launch training with only limited resources.

Optimize GPU Memory#

There are various ways to optimize GPU memory usage. One trick is to reduce dataset.batch_size, which can cause your training to take longer than usual.

Typically, the following options result in a more balanced performance optimization:

Set train.precision to fp16 to enable automatic mixed precision training. This can reduce your GPU memory usage and speed up the training. But might affect the accuracy.
Try using more lightweight backbones like DLA34.

Evaluating the Model#

evaluate#

The evaluate parameter defines the hyperparameters of the evaluate process.

evaluate:
  checkpoint: /path/to/model.pth
  opencv: False
  eval_num_symmetry: 1
  results_dir: /path/to/saving/directory

Parameter	Datatype	Default	Description	Supported Values
`checkpoint`	string		Path to PyTorch model to evaluate
`results_dir`	string	/results/evaluate	The directory to save evaluation results
`num_gpus`	unsigned int	1	The number of GPUs to use for distributed evaluation	>0
`gpu_ids`	List[int]	[0]	The indices of the GPU’s to use for distributed evaluation
`opencv`	bool	False	If `opencv` sets to False, the returned 3D keypoints are in OpenGL camera coordinate If `opencv` sets to True, the returned 3D keypoints are in OpenCV camera coordinate In Objectron Dataset, the defaule 3D keypoints are in OpenGL camera coordinate.	True, False
`eval_num_symmetry`	unsigned int	1	For symmetric object categories (e.g. bottle), we rotate the estimated bounding box along the symmetry axis N times (N = 100) and evaluate the prediction w.r.t. each rotated instance For non-symmetric object category, it sets to 1 as the defaule value The reported number is the instance that maximizes 3D IoU	>0
`trt_engine`	string		Path to TensorRT model for evaluation or inference

Running Inference with an CenterPose Model#

inference#

The inference parameter defines the hyperparameters of the inference process.

inference:
  checkpoint: /path/to/model.pth
  visualization_threshold: 0.3
  principle_point_x: 300.7
  principle_point_y: 392.8
  focal_length_x: 615.0
  focal_length_y: 615.0
  skew: 0.0
  use_pnp: True
  save_json: True
  save_visualization: True
  opencv: True

Parameter	Datatype	Default	Description	Supported Values
`checkpoint`	string		Path to PyTorch model to inference
`results_dir`	string	/results/inference	The directory to save inference results
`num_gpus`	unsigned int	1	The number of GPUs to use for distributed inference	>0
`gpu_ids`	List[int]	[0]	The indices of the GPU’s to use for distributed inference
`visualization_threshold`	float	0.3	Confidence threshold to filter predictions	>=0
`principle_point_x`	float	300.7	The principle point x of the intrinsic matrix. Please use the correct camera calibration matrix along with your data	>0
`principle_point_y`	float	392.8	The principle point y of the intrinsic matrix. Please use the correct camera calibration matrix along with your data	>0
`focal_length_x`	float	615.0	The focal length x of the intrinsic matrix. Please use the correct camera calibration matrix along with your data	>0
`focal_length_y`	float	615.0	The focal length y of the intrinsic matrix.Please use the correct camera calibration matrix along with your data	>0
`skew`	float	0.0	The skew of the intrinsic matrix. Please use the correct camera calibration matrix along with your data	>=0
`use_pnp`	bool	True	The PnP algorithm that used to establish 2D-3D correspondences for solving the 6-DoF pose	True, False
`save_json`	bool	True	Save all the results to local JSON file, including 2d keypoints, 3D keypoints, location, quaternion and relative scale	True, False
`save_visualization`	bool	True	Save the visualization results to local .jpg file, including projected 2d bounding box along with the point order, relative scale and object pose The +y is up (aligned with the gravity, green line); The +x follows right hand rule (red line); The +z is the front face (blue line)	True, False
`opencv`	bool	False	If `opencv` sets to False, the returned 3D keypoints are in OpenGL camera coordinate If `opencv` sets to True, the returned 3D keypoints are in OpenCV camera coordinate In Objectron Dataset, the defaule 3D keypoints are in OpenGL camera coordinate.	True, False
`trt_engine`	string		Path to TensorRT model for evaluation or inference

The inference tool for CenterPose models can be used to visualize 3D bounding boxes in 2D image plane, the order of points and the object relative dimension. Furthermore, it also generates a frame-by-frame JSON file for recording the results for each image.

Exporting the Model#

export#

The export parameter defines the hyperparameters of the export process.

export:
  gpu_id: 0
  checkpoint: /path/to/model.pth
  onnx_file: /path/to/model.onnx
  input_channel: 3
  input_width: 512
  input_height: 512
  opset_version: 16
  do_constant_folding: True

Parameter	Datatype	Default	Description	Supported Values
`gpu_id`	unsigned int	0	The gpu id for converting the pth model to ONNX model	>=0
`checkpoint`	string		The path to the PyTorch model to export
`onnx_file`	string		The path to the `.onnx` file
`input_channel`	unsigned int	3	The input channel size. Only the value 3 is supported.	3
`input_width`	unsigned int	512	The input width	>0
`input_height`	unsigned int	512	The input height	>0
`opset_version`	unsigned int	16	The opset version of the exported ONNX	>0
`do_constant_folding`	bool	True	Whether to execute constant folding. If the TensorRT version lower than 8.6, it sets to True	True, False