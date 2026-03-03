CenterPose#

CenterPose is a category-level object pose estimation model included in the TAO. It supports the following tasks:

  • train

  • evaluate

  • inference

  • export

These tasks can be invoked from the TAO Launcher using the following convention on the command-line:

tao model centerpose <sub_task> <args_per_subtask>

where, args_per_subtask are the command-line arguments required for a given subtask. Each subtask is explained in detail in the following sections.

Data Input for CenterPose#

CenterPose expects directories of images and annotated JSON files for training or validation. See the CenterPose Data Format page for more information about the input data format.

Creating an Experiment Spec File#

BASE_EXPERIMENT_ID=$(tao centerpose list-base-experiments | jq -r '.[0].id')
SPECS=$(tao centerpose get-job-schema --action train --base-experiment-id $BASE_EXPERIMENT_ID | jq -r '.default')

The training experiment spec file for CenterPose includes model, train, and dataset parameters. Here is an example spec file for training a CenterPose model with a fan_small backbone on a Google Objectron dataset bike category.

dataset:
  train_data: /path/to/category/train/
  val_data: /path/to/category/val/
  num_classes: 1
  batch_size: 64
  workers: 4
  category: bike
  num_symmetry: 1
  max_objs: 10

train:
  num_gpus: 1
  gpu_ids: [0]
  checkpoint_interval: 5
  validation_interval: 5
  num_epochs: 10
  clip_grad_val: 100.0
  seed: 1234
  resume_training_checkpoint_path: null
  precision: "fp32"
  optim:
    lr: 6e-05
    lr_steps: [90, 120]

model:
  down_ratio: 4
  use_pretrained: True
  backbone:
    model_type: fan_small
    pretrained_backbone_path: /path/to/your-fan-small-pretrained-model

Parameter

Data Type

Default

Description

Supported Values

model

dict config

The configuration of the model architecture

dataset

dict config

The configuration of the dataset

train

dict config

The configuration of the training task

evaluate

dict config

The configuration of the evaluation task

inference

dict config

The configuration of the inference task

encryption_key

string

None

The encryption key to encrypt and decrypt model files

results_dir

string

/results

The directory where experiment results are saved

export

dict config

The configuration of the ONNX export task

gen_trt_engine`

dict config

The configuration of the TensorRT generation task. Only used in tao deploy

model#

The model parameter provides options to change the CenterPose architecture.

model:
down_ratio: 4
use_pretrained: False
backbone:
  model_type: fan_small
  pretrained_backbone_path: /path/to/your-fan-small-pretrained-model

Parameter

Datatype

Default

Description

Supported Values

down_ratio

int

4

The down scale ratio of the network feature map.

4

use_pretrained

bool

False

A flag specifying whether to initial the backbone with the pretrained weights.

True, False

backbone

dict config

The config for the backbone model type and the path of the pretrained weights.

>0

backbone#

The backbone parameter provides options to change the CenterPose backbone architecture.

backbone:
  model_type: fan_small
  pretrained_backbone_path: /path/to/your-fan-small-pretrained-model

Parameter

Datatype

Default

Description

Supported Values

pretrained_backbone_path

string

None

The optional path to the pretrained backbone file. Set the pretrained path when using “FAN” backbone. The “DLA34” backbone can download the pretrained weight automatically, set it to “null”.

string to the path

model_type

string

DLA34

The backbone name of the model. DLA34 and FAN are supported.

DLA34, fan_small, fan_base, fan_large

train#

The train parameter defines the hyperparameters of the training process.

train:
  num_gpus: 1
  gpu_ids: [0]
  checkpoint_interval: 5
  validation_interval: 5
  num_epochs: 10
  clip_grad_val: 100.0
  seed: 1234
  resume_training_checkpoint_path: null
  precision: "fp32"
  optim:
    lr: 6e-05
    lr_steps: [90, 120]

Parameter

Datatype

Default

Description

Supported Values

num_gpus

unsigned int

1

The number of GPUs to use for distributed training

>0

gpu_ids

List[int]

[0]

The indices of the GPU’s to use for distributed training

seed

unsigned int

1234

The random seed for random, numpy, and torch

>0

num_epochs

unsigned int

10

The total number of epochs to run the experiment

>0

checkpoint_interval

unsigned int

1

The epoch interval at which the checkpoints are saved

>0

validation_interval

unsigned int

1

The epoch interval at which the validation is run

>0

resume_training_checkpoint_path

string

The intermediate PyTorch Lightning checkpoint to resume training from

results_dir

string

/results/train

The directory to save training results

clip_grad_val

float

100.0

Clips gradient of an iterable of parameters at specified value

>=0

precision

string

fp32

Specifying “fp16” enables precision training. Training with fp16 can help save GPU memory.

fp32, fp16

optim

dict config

The config for the optimizer, including the learning rate, learning scheduler

>0

optim#

The optim parameter defines the config for the optimizer in training, including the learning rate and learning rate steps.

optim:
  lr: 6e-05
  lr_steps: [90, 120]

Parameter

Datatype

Default

Description

Supported Values

lr

float

6e-05

The initial learning rate for training the model, excluding the backbone

>0.0

lr_steps

int list

[90, 120]

The steps to decrease the learning rate for the scheduler

int list

dataset#

The dataset parameter defines the dataset source, training batch size, and dataset settings.

dataset:
  train_data: /path/to/category/train/
  val_data: /path/to/category/val/
  num_classes: 1
  batch_size: 64
  workers: 4
  category: bike
  num_symmetry: 1
  max_objs: 10

Parameter

Datatype

Default

Description

Supported Values

train_data

string

The path of training data: The directory that contains the training images and its related JSON file They are using the same file name for the image and JSON file in the same folder

val_data

string

The path of validation data: The directory that contains the validation images and its related JSON file They are using the same file name for the image and JSON file in the same folder

test_data

string

The path of test data: The directory that contains the testing images and its related JSON file They are using the same file name for the image and JSON file in the same folder

inference data

string

The path of inference data: The directory that contains the inference images No need the JSON file for the inference pipeline

num_classes

unsigned int

1

The number of category in the training data. Because CenterPose is a category-level pose estimation method, it only supported 1 class.

1

batch_size

unsigned int

4

The batch size for training and validation

>0

workers

unsigned int

8

The number of parallel workers processing data

>0

category

string

The category name of the training dataset Different categories may have different training strategies. Please see num_symmetry for more details

num_symmetry

unsigned int

1

The number of symmetric rotations, which means the rotation times for the 3D bounding box along with the y-axis Each rotated bounding box is treated as a ground truth for the training For example, bottle is symmetric object and the num_symmetry can be set to 12 (30 degree for each rotation) The num_symmetry sets to 1 when the object is non-symmetric

>0

max_objs

unsigned int

10

The maximum number of objects in the single image that used for training.

>0

Training the Model#

Use the following command to run CenterPose training:

TRAIN_JOB_ID=$(tao centerpose create-job \
  --kind experiment \
  --name "centerpose_train" \
  --action train \
  --workspace-id $WORKSPACE_ID \
  --specs "$TRAIN_SPECS" \
  --train-datasets '["'$DATASET_ID'"]' \
  --eval-dataset "$DATASET_ID" \
  --base-experiment-ids '["'$BASE_EXPERIMENT_ID'"]' \
  --encryption-key "nvidia_tlt" | jq -r '.id')

tao model centerpose train [-h] -e <experiment_spec_file>
                   [results_dir=<global_results_dir>]
                   [model.<model_option>=<model_option_value>]
                   [dataset.<dataset_option>=<dataset_option_value>]
                   [train.<train_option>=<train_option_value>]
                   [train.gpu_ids=<gpu indices>]
                   [train.num_gpus=<number of gpus>]

Required Arguments

The only required argument is the path to the experiment spec:

  • -e, --experiment_spec: The experiment specification file to set up the training experiment

Optional Arguments

You can set optional arguments to override the option values in the experiment spec file.

Note

For training, evaluation, and inference, we expose two variables for each task: num_gpus and gpu_ids, which default to 1 and [0], respectively. If both are passed, but are inconsistent, for example num_gpus = 1, gpu_ids = [0, 1], then they are modified to follow the setting that implies more GPUs; in the same example num_gpus is modified from 1 to 2.

In some cases multi-GPU training may result in a segmentation fault. You can circumvent this by setting the enviroment variable OMP_NUM_THREADS to 1. Depending upon your model of execution, you may use the following methods to set this variable:

  • CLI Launcher:

    You may set the environment variable by adding the following fields to the Envs field of your ~/.tao_mounts.json file as mentioned in bullet 3 in ths section Running the launcher.

    {
    "Envs": [
        {
            "variable": "OMP_NUM_THREADSR",
            "value": "1"
        }

}

  • Docker:

    You may set environment variables in Docker by setting the -e flag in the Docker command line.

    docker run -it --rm --gpus all \
    -e OMP_NUM_THREADS=1 \
    -v /path/to/local/mount:/path/to/docker/mount nvcr.io/nvidia/tao/tao-toolkit:5.5.0-pyt <model> train -e

Checkpointing and Resuming Training

At every train.checkpoint_interval, a PyTorch Lightning checkpoint is saved. It is called model_epoch_<epoch_num>.pth. Checkpoints are saved in train.results_dir, like this:

$ ls /results/train

'model_epoch_000.pth'
'model_epoch_001.pth'
'model_epoch_002.pth'
'model_epoch_003.pth'
'model_epoch_004.pth'

The latest checkpoint will also be saved as centerpose_model_latest.pth. Training will automatically resume from centerpose_model_latest.pth if it exists in train.results_dir. This will be superseded by train.resume_training_checkpoint_path if it is provided.

The major implication of this logic is that, if you wish to trigger fresh training from scratch, either

  • Specify a new, empty results directory (Recommended), or

  • Remove the latest checkpoint from the results directory

Optimizing Resource for Training CenterPose#

Training CenterPose requires GPUs (for example, V100/A100) and CPU memory to be trained on a standard dataset, such as Objectron. The following are some of the strategies you can use to launch training with only limited resources.

Optimize GPU Memory#

There are various ways to optimize GPU memory usage. One trick is to reduce dataset.batch_size, which can cause your training to take longer than usual.

Typically, the following options result in a more balanced performance optimization:

  • Set train.precision to fp16 to enable automatic mixed precision training. This can reduce your GPU memory usage and speed up the training. But might affect the accuracy.

  • Try using more lightweight backbones like DLA34.

Evaluating the Model#

evaluate#

The evaluate parameter defines the hyperparameters of the evaluate process.

evaluate:
  checkpoint: /path/to/model.pth
  opencv: False
  eval_num_symmetry: 1
  results_dir: /path/to/saving/directory

Parameter

Datatype

Default

Description

Supported Values

checkpoint

string

Path to PyTorch model to evaluate

results_dir

string

/results/evaluate

The directory to save evaluation results

num_gpus

unsigned int

1

The number of GPUs to use for distributed evaluation

>0

gpu_ids

List[int]

[0]

The indices of the GPU’s to use for distributed evaluation
opencv
bool
False
If opencv sets to False, the returned 3D keypoints are in OpenGL camera coordinate
If opencv sets to True, the returned 3D keypoints are in OpenCV camera coordinate
In Objectron Dataset, the defaule 3D keypoints are in OpenGL camera coordinate.
True, False
eval_num_symmetry
unsigned int
1
For symmetric object categories (e.g. bottle), we rotate the estimated bounding box along the symmetry axis N times (N = 100) and evaluate the prediction w.r.t. each rotated instance
For non-symmetric object category, it sets to 1 as the defaule value
The reported number is the instance that maximizes 3D IoU
>0

trt_engine

string

Path to TensorRT model to evaluate. Should be only used with tao deploy

To run evaluation with a CenterPose model, use this command:

EVAL_JOB_ID=$(tao centerpose create-job \
  --kind experiment \
  --name "centerpose_evaluate" \
  --action evaluate \
  --workspace-id $WORKSPACE_ID \
  --parent-job-id $TRAIN_JOB_ID \
  --eval-dataset "$DATASET_ID" \
  --specs "$EVALUATE_SPECS" \
  --base-experiment-ids '["'$BASE_EXPERIMENT_ID'"]' \
  --encryption-key "nvidia_tlt" | jq -r '.id')

tao model centerpose evaluate [-h] -e <experiment_spec>

evaluate.checkpoint=<model to be evaluated> [evaluate.<evaluate_option>=<evaluate_option_value>] [evaluate.gpu_ids=<gpu indices>] [evaluate.num_gpus=<number of gpus>]

Required Arguments

The following arguments are required.

  • -e, --experiment_spec: The experiment spec file to set up the evaluation experiment

  • evaluate.checkpoint: The .pth model to be evaluated.

Optional Arguments

The following arguments are optional to run the command.

Running Inference with an CenterPose Model#

inference#

The inference parameter defines the hyperparameters of the inference process.

inference:
  checkpoint: /path/to/model.pth
  visualization_threshold: 0.3
  principle_point_x: 300.7
  principle_point_y: 392.8
  focal_length_x: 615.0
  focal_length_y: 615.0
  skew: 0.0
  use_pnp: True
  save_json: True
  save_visualization: True
  opencv: True

Parameter

Datatype

Default

Description

Supported Values

checkpoint

string

Path to PyTorch model to inference

results_dir

string

/results/inference

The directory to save inference results

num_gpus

unsigned int

1

The number of GPUs to use for distributed inference

>0

gpu_ids

List[int]

[0]

The indices of the GPU’s to use for distributed inference

visualization_threshold

float

0.3

Confidence threshold to filter predictions

>=0

principle_point_x

float

300.7

The principle point x of the intrinsic matrix. Please use the correct camera calibration matrix along with your data

>0

principle_point_y

float

392.8

The principle point y of the intrinsic matrix. Please use the correct camera calibration matrix along with your data

>0

focal_length_x

float

615.0

The focal length x of the intrinsic matrix. Please use the correct camera calibration matrix along with your data

>0

focal_length_y

float

615.0

The focal length y of the intrinsic matrix.Please use the correct camera calibration matrix along with your data

>0

skew

float

0.0

The skew of the intrinsic matrix. Please use the correct camera calibration matrix along with your data

>=0

use_pnp

bool

True

The PnP algorithm that used to establish 2D-3D correspondences for solving the 6-DoF pose

True, False

save_json

bool

True

Save all the results to local JSON file, including 2d keypoints, 3D keypoints, location, quaternion and relative scale

True, False
save_visualization
bool
True
Save the visualization results to local .jpg file, including projected 2d bounding box along with the point order, relative scale and object pose
The +y is up (aligned with the gravity, green line); The +x follows right hand rule (red line); The +z is the front face (blue line)
True, False
opencv
bool
False
If opencv sets to False, the returned 3D keypoints are in OpenGL camera coordinate
If opencv sets to True, the returned 3D keypoints are in OpenCV camera coordinate
In Objectron Dataset, the defaule 3D keypoints are in OpenGL camera coordinate.
True, False

trt_engine

string

Path to TensorRT model to inference. Should be only used with tao deploy

The inference tool for CenterPose models can be used to visualize 3D bounding boxes in 2D image plane, the order of points and the object relative dimension. Furthermore, it also generates a frame-by-frame JSON file for recording the results for each image.

INFERENCE_JOB_ID=$(tao centerpose create-job \
  --kind experiment \
  --name "centerpose_inference" \
  --action inference \
  --workspace-id $WORKSPACE_ID \
  --parent-job-id $TRAIN_JOB_ID \
  --inference-dataset "$DATASET_ID" \
  --specs "$INFERENCE_SPECS" \
  --base-experiment-ids '["'$BASE_EXPERIMENT_ID'"]' \
  --encryption-key "nvidia_tlt" | jq -r '.id')

tao model centerpose inference [-h] -e <experiment spec file>
                   inference.checkpoint=<model to be inferenced>
                   [inference.<inference_option>=<inference_option_value>]
                   [inference.gpu_ids=<gpu indices>]
                   [inference.num_gpus=<number of gpus>]

Required Arguments

The following arguments are required to run the command.

  • -e, --experiment_spec: The experiment spec file to set up the inference experiment

  • inference.checkpoint: The .pth model to inference.

Optional Arguments

The following arguments are optional to run the command.

Exporting the Model#

export#

The export parameter defines the hyperparameters of the export process.

export:
  gpu_id: 0
  checkpoint: /path/to/model.pth
  onnx_file: /path/to/model.onnx
  input_channel: 3
  input_width: 512
  input_height: 512
  opset_version: 16
  do_constant_folding: True

Parameter

Datatype

Default

Description

Supported Values

gpu_id

unsigned int

0

The gpu id for converting the pth model to ONNX model

>=0

checkpoint

string

The path to the PyTorch model to export

onnx_file

string

The path to the .onnx file

input_channel

unsigned int

3

The input channel size. Only the value 3 is supported.

3

input_width

unsigned int

512

The input width

>0

input_height

unsigned int

512

The input height

>0

opset_version

unsigned int

16

The opset version of the exported ONNX

>0

do_constant_folding

bool

True

Whether to execute constant folding. If the TensorRT version lower than 8.6, it sets to True

True, False

 
EXPORT_JOB_ID=$(tao centerpose create-job \
  --kind experiment \
  --name "centerpose_export" \
  --action export \
  --workspace-id $WORKSPACE_ID \
  --parent-job-id $TRAIN_JOB_ID \
  --specs "$EXPORT_SPECS" \
  --base-experiment-ids '["'$BASE_EXPERIMENT_ID'"]' \
  --encryption-key "nvidia_tlt" | jq -r '.id')

tao model centerpose export [-h] -e <experiment spec file>
                   export.checkpoint=<model to export>
                   export.onnx_file=<onnx path>
                   [export.<export_option>=<export_option_value>]

Required Arguments

The following arguments are required to run the command.

  • -e, --experiment_spec: The path to an experiment spec file

  • export.checkpoint: The .pth model to export.

  • export.onnx_file: The path where the .etlt or .onnx model is saved.

Optional Arguments

The following arguments are optional to run the command.

TensorRT Engine Generation, Validation, and int8 Calibration#

For deployment, refer to TAO Deploy documentation for CenterPose.