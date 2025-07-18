Use the following command to run CenterPose training:

Copy Copied! tao model centerpose train [-h] -e <experiment_spec_file> [results_dir=<global_results_dir>] [model.<model_option>=<model_option_value>] [dataset.<dataset_option>=<dataset_option_value>] [train.<train_option>=<train_option_value>] [train.gpu_ids=<gpu indices>] [train.num_gpus=<number of gpus>]

The only required argument is the path to the experiment spec:

-e, --experiment_spec : The experiment specification file to set up the training experiment

You can set optional arguments to override the option values in the experiment spec file.

Note For training, evaluation, and inference, we expose 2 variables for each respective task: num_gpus and gpu_ids , which default to 1 and [0] , respectively. If both are passed, but inconsistent, for example num_gpus = 1 , gpu_ids = [0, 1] , then they are modified to follow the setting with more GPUs, for example num_gpus = 1 -> num_gpus = 2 .

At every train.checkpoint_interval , a PyTorch Lightning checkpoint is saved. It is called model_epoch_<epoch_num>.pth . These are saved in train.results_dir , like so:

Copy Copied! $ ls /results/train 'model_epoch_000.pth' 'model_epoch_001.pth' 'model_epoch_002.pth' 'model_epoch_003.pth' 'model_epoch_004.pth'

The latest checkpoint will also be saved as centerpose_model_latest.pth . Training will automatically resume from centerpose_model_latest.pth if it exists in train.results_dir . This will be superseded by train.resume_training_checkpoint_path if it is provided.

The major implication of this logic is that, if you wish to trigger fresh training from scratch, either

Specify a new, empty results directory (Recommended) , or

Remove the latest checkpoint from the results directory

Training CenterPose requires GPUs (for example, V100/A100) and CPU memory to be trained on a standard dataset, such as Objectron. The following are some of the strategies you can use to launch training with only limited resources.

There are various ways to optimize GPU memory usage. One trick is to reduce dataset.batch_size , which can cause your training to take longer than usual.

Typically, the following options result in a more balanced performance optimization: