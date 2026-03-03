PoseClassificationNet takes a sequence of skeletons (body poses) as network input and predicts the actions of one or more persons in those frames. The model supported in the current version is based on the spatial-temporal graph convolutional network (ST-GCN), which is the most commonly used baseline for skeleton-based action recognition due to its simplicity and computational efficiency. Unlike pixel-based action recognition, ST-GCN is able to exploit the local pattern and correlation from a spatial-temporal graph of human skeletons. This model can be used to train graph convolutional networks (GCNs) for other purposes through transfer learning. Newer architectures with state-of-the-art performance will be released in the future. TAO provides the network backbone for 3D poses.

Note Throughout this documentation are references to $EXPERIMENT_ID and $DATASET_ID in the FTMS Client sections. For instructions on creating a dataset using the remote client, refer to the Creating a dataset section in the Remote Client documentation. For instructions on creating an experiment using the remote client, refer to the Creating an experiment section in the Remote Client documentation.

The spec format is YAML for TAO Launcher, and JSON for FTMS Client.

File-related parameters, such as dataset paths or pretrained model paths, are required only for TAO Launcher, not for FTMS Client.

Preparing the Dataset# PoseClassificationNet requires a sequence of skeletons (body poses) for input. The coordinates need to be normalized. For example, 3D joints are produced relative to the root keypoint (i.e. pelvis) and normalized by the focal length (1200.0 for 1080P). The entrypoint for dataset conversion generates an array of spatio-temporal sequences based on the output JSON metadata from the deepstream-bodypose-3d app. The input data for training or inference are formatted as a NumPy array in five dimensions (N, C, T, V, M) : N : The number of sequences

C : The number of input channels, which is set to 3 in the NGC model

T : The maximum sequence length in frames, which is 300 (10 seconds for 30 FPS) in the NGC model

V : The number of joint points, set to 34 for the NVIDIA format

M : The number of persons. The pre-trained model assumes a single object, but it can also support multiple people The output of model inference is an array of N elements that gives the predicted action class for each sequence. The labels used for training or evaluation are stored as a pickle file that consists of a list of two lists, including N elements each. The first list contains N strings of sample names. The second list contains the labeled action class ID of each sequence. The following is an example: [[ "xl6vmD0XBS0.json" , "OkLnSMGCWSw.json" , "IBopZFDKfYk.json" , "HpoFylcrYT4.json" , "mlAtn_zi0bY.json" , ... ] , [ 235 , 388 , 326 , 306 , 105 , ... ]] The graph to model skeletons is defined by two configuration parameters: graph_layout (string): Must be one the following candidates: nvidia consists of 34 joints. For more information, please refer to AR SDK Programming Guide. openpose consists of 18 joints. For more information, please refer to OpenPose. human3.6m consists of 17 joints. For more information, please refer to Human3.6M. ntu-rgb+d consists of 25 joints. For more information, please refer to NTU RGB+D. ntu_edge consists of 24 joints. For more information, please refer to NTU RGB+D. coco consists of 17 joints. For more information, please refer to COCO.

graph_strategy (string): Must be one of the following candidates (for more information, refer to the “Partition Strategies” section in this paper): uniform : Uniform Labeling distance : Distance Partitioning spatial : Spatial Configuration

Note All-in-one scripts are provided for processing Kinetics and self-annotated NVIDIA datasets. The preprocessed data and labels of the NVIDIA dataset can be accessed here.

Creating an Experiment Spec File# The spec file for PoseClassificationNet includes model , dataset , and train parameters. Here is an example spec for training a 3D-pose-based model on the NVIDIA dataset. It contains six classes: “sitting_down”, “getting_up”, “sitting”, “standing”, “walking”, “jumping”: TAO Client (v2 API) Use the following command to get an experiment spec file for PoseClassificationNet: BASE_EXPERIMENT_ID = $( tao pose_classification list-base-experiments | jq -r '.[0].id' ) SPECS = $( tao pose_classification get-job-schema --action train --base-experiment-id $BASE_EXPERIMENT_ID | jq -r '.default' ) TAO Launcher model : model_type : ST-GCN pretrained_model_path : "/path/to/pretrained_model.pth" input_channels : 3 dropout : 0.5 graph_layout : "nvidia" graph_strategy : "spatial" edge_importance_weighting : True dataset : train_dataset : data_path : "/path/to/train_data.npy" label_path : "/path/to/train_label.pkl" val_dataset : data_path : "/path/to/val_data.npy" label_path : "/path/to/val_label.pkl" num_classes : 6 label_map : sitting_down : 0 getting_up : 1 sitting : 2 standing : 3 walking : 4 jumping : 5 batch_size : 16 num_workers : 1 train : optim : lr : 0.1 momentum : 0.9 nesterov : True weight_decay : 0.0001 lr_scheduler : "MultiStep" lr_steps : - 10 - 60 lr_decay : 0.1 num_epochs : 10 checkpoint_interval : 5 validation_interval : 5 seed : 1234 Parameter Data Type Default Description Supported Values model dict config – The configuration of the model architecture dataset dict config – The configuration of the dataset train dict config – The configuration of the training task evaluate dict config – The configuration of the evaluation task inference dict config – The configuration of the inference task encryption_key string None The encryption key to encrypt and decrypt model files results_dir string /results The directory where experiment results are saved export dict config – The configuration of the ONNX export task gen_trt_engine dict config – The configuration of the TensorRT generation task. Only used in TAO deploy model# The model parameter provides options to change the PoseClassificationNet architecture. model: model_type: ST-GCN pretrained_model_path: "/path/to/pretrained_model.pth" input_channels: 3 dropout: 0 .5 graph_layout: "nvidia" graph_strategy: "spatial" edge_importance_weighting: True Parameter Datatype Default Description Supported Values model_type string ST-GCN The type of model, which can only be ST-GCN for now. Newer architectures will be supported in the future. ST-GCN pretrained_model_path string The path to the pre-trained model input_channels unsigned int 3 The number of input channels (dimension of body poses) >0 dropout float 0.5 The probability to drop hidden units 0.0 ~ 1.0 graph_layout string nvidia The layout of the graph for modeling skeletons. It can be nvidia , openpose , human3.6m , ntu-rgb+d , ntu_edge , or coco . nvidia/openpose/human3.6m/ntu-rgb+d/ntu_edge/coco graph_strategy string spatial The strategy of the graph for modeling skeletons. It can be uniform , distance , or spatial . uniform/distance/spatial edge_importance_weighting bool True Specifies whether to enable edge importance weighting True/False dataset# The dataset parameter defines the dataset source, training batch size, and augmentation. dataset: train_dataset: data_path: "/path/to/train_data.npy" label_path: "/path/to/train_label.pkl" val_dataset: data_path: "/path/to/val_data.npy" label_path: "/path/to/val_label.pkl" num_classes: 6 label_map: sitting_down: 0 getting_up: 1 sitting: 2 standing: 3 walking: 4 jumping: 5 batch_size: 16 num_workers: 1 Parameter Datatype Default Description Supported Values train_dataset dict The data_path to the data in a NumPy array and label_path to the labels in a pickle file for training val_dataset dict The data_path to the data in a NumPy array and label_path to the labels in a pickle file for validation num_classes unsigned int 6 The number of action classes >0 label_map dict A dict that maps the class names to indices random_choose bool False Specifies whether to randomly choose a portion of the input sequence. True/False random_move bool False Specifies whether to randomly move the input sequence. True/False window_size unsigned int -1 The length of the output sequence. A value of -1 specifies the original length. batch_size unsigned int 64 The batch size for training and validation >0 num_workers unsigned int 1 The number of parallel workers processing data >0 Note The input layout is NCTVM , where N is the batch size, C is the number of input channels, T is the sequence length, V is the number of keypoints, and M is the number of people. train# The train parameter defines the hyperparameters of the training process. train: optim: lr: 0 .1 momentum: 0 .9 nesterov: True weight_decay: 0 .0001 lr_scheduler: "MultiStep" lr_steps: - 10 - 60 lr_decay: 0 .1 num_epochs: 10 checkpoint_interval: 5 validation_interval: 5 seed: 1234 Parameter Datatype Default Description Supported Values num_gpus unsigned int 1 The number of GPUs to use for distributed training >0 gpu_ids List[int] [0] The indices of the GPU’s to use for distributed training seed unsigned int 1234 The random seed for random, NumPy, and torch >0 num_epochs unsigned int 10 The total number of epochs to run the experiment >0 checkpoint_interval unsigned int 1 The epoch interval at which the checkpoints are saved >0 validation_interval unsigned int 1 The epoch interval at which the validation is run >0 resume_training_checkpoint_path string The intermediate PyTorch Lightning checkpoint to resume training from results_dir string /results/train The directory to save training results optim dict config The configuration for the SGD optimizer, including the learning rate, learning scheduler, weight decay, etc. grad_clip float 0.0 The amount to clip the gradient by the L2 norm. A value of 0.0 specifies no clipping. >=0 optim# The optim parameter defines the config for the SGD optimizer in training, including the learning rate, learning scheduler, and weight decay. optim: lr: 0 .1 momentum: 0 .9 nesterov: True weight_decay: 0 .0001 lr_scheduler: "MultiStep" lr_steps: - 10 - 60 lr_decay: 0 .1 Parameter Datatype Default Description Supported Values lr float 0.1 The initial learning rate for the training >0.0 momentum float 0.9 The momentum for the SGD optimizer >0.0 nesterov bool True Specifies whether to enable Nesterov momentum. True/False weight_decay float 1e-4 The weight decay coefficient >0.0 lr_scheduler



string



MultiStep



The learning scheduler. Two schedulers are provided: * MultiStep : Decrease the lr by lr_decay at setting steps. * AutoReduce : Decrease the lr by lr_decay while lr_monitor doesn’t decline more than 0.1% of the previous value. MultiStep/AutoReduce



lr_monitor string val_loss The monitor value for the AutoReduce scheduler val_loss/train_loss patience unsigned int 1 The number of epochs with no improvement, after which learning rate will be reduced >0 min_lr float 1e-4 The minimum learning rate in the training >0.0 lr_steps int list [10, 60] The steps to decrease the learning rate for the MultiStep scheduler int list lr_decay float 0.1 The decreasing factor for the learning rate scheduler >0.0

Training the Model# Use the following command to run PoseClassificationNet training: TAO Client (v2 API) TRAIN_JOB_ID = $( tao pose_classification create-job \ --kind experiment \ --name "pose_classification_train" \ --action train \ --workspace-id $WORKSPACE_ID \ --specs " $TRAIN_SPECS " \ --train-datasets '["' $DATASET_ID '"]' \ --eval-dataset " $DATASET_ID " \ --base-experiment-ids '["' $BASE_EXPERIMENT_ID '"]' \ --encryption-key "nvidia_tlt" | jq -r '.id' ) TAO Launcher tao model pose_classification train [ -h ] -e <experiment_spec> [ results_dir = <global_results_dir> ] [ model.<model_option> = <model_option_value> ] [ dataset.<dataset_option> = <dataset_option_value> ] [ train.<train_option> = <train_option_value> ] [ train.gpu_ids = <gpu indices> ] [ train.num_gpus = <number of gpus> ] Required Arguments The only required argument is the path to the experiment spec: -e, --experiment_spec : The experiment specification file to set up the training experiment Optional Arguments You can set optional arguments to override the option values in the experiment spec file. -h, --help : Show this help message and exit.

model.<model_option> : The model options.

dataset.<dataset_option> : The dataset options.

train.<train_option> : The train options.

train.optim.<optim_option> : The optimizer options Note For training, evaluation, and inference, we expose two variables for each task: num_gpus and gpu_ids , which default to 1 and [0] , respectively. If both are passed, but are inconsistent, for example num_gpus = 1 , gpu_ids = [0, 1] , then they are modified to follow the setting that implies more GPUs; in the same example num_gpus is modified from 1 to 2. In some cases multi-GPU training may result in a segmentation fault. You can circumvent this by setting the enviroment variable OMP_NUM_THREADS to 1. Depending upon your model of execution, you may use the following methods to set this variable: CLI Launcher : You may set the environment variable by adding the following fields to the Envs field of your ~/.tao_mounts.json file as mentioned in bullet 3 in ths section Running the launcher. { "Envs" : [ { "variable" : "OMP_NUM_THREADSR" , "value" : "1" } }

Docker: You may set environment variables in Docker by setting the -e flag in the Docker command line. docker run -it --rm --gpus all \ -e OMP_NUM_THREADS = 1 \ -v /path/to/local/mount:/path/to/docker/mount nvcr.io/nvidia/tao/tao-toolkit:5.5.0-pyt <model> train -e Checkpointing and Resuming Training At every train.checkpoint_interval , a PyTorch Lightning checkpoint is saved. It is called model_epoch_<epoch_num>.pth . Checkpoints are saved in train.results_dir , like this: $ ls /results/train 'model_epoch_000.pth' 'model_epoch_001.pth' 'model_epoch_002.pth' 'model_epoch_003.pth' 'model_epoch_004.pth' The latest checkpoint is saved as pc_model_latest.pth . Training automatically resumes from pc_model_latest.pth , if it exists in train.results_dir . This will be superseded by train.resume_training_checkpoint_path , if it is provided. The major implication of this logic is that, if you wish to trigger fresh training from scratch, either: Specify a new, empty results directory (Recommended)

Remove the latest checkpoint from the results directory Note Pre-trained models are not designed to be re-trained with input data of varying dimensions. ST-GCN is a lightweight network, and leveraging a pre-trained model typically doesn’t significantly affect the final accuracy.

Evaluating the Model# The evaluation metric of PoseClassificationNet is the accuracy of action recognition. Use the following command to run PoseClassificationNet evaluation: TAO Client (v2 API) EVAL_JOB_ID = $( tao pose_classification create-job \ --kind experiment \ --name "pose_classification_evaluate" \ --action evaluate \ --workspace-id $WORKSPACE_ID \ --parent-job-id $TRAIN_JOB_ID \ --eval-dataset " $DATASET_ID " \ --specs " $EVALUATE_SPECS " \ --base-experiment-ids '["' $BASE_EXPERIMENT_ID '"]' \ --encryption-key "nvidia_tlt" | jq -r '.id' ) TAO Launcher tao model pose_classification evaluate [ -h ] -e <experiment_spec_file> evaluate.checkpoint = <model to be evaluated> evaluate.test_dataset.data_path = <path to test data> evaluate.test_dataset.label_path = <path to test labels> [ evaluate.<evaluate_option> = <evaluate_option_value> ] [ evaluate.gpu_ids = <gpu indices> ] [ evaluate.num_gpus = <number of gpus> ] Required Arguments The following arguments are required. -e, --experiment_spec : The experiment spec file to set up the evaluation experiment.

evaluate.checkpoint : The .pth model to be evaluated.

evaluate.test_dataset.data_path : The path to the test data.

evaluate.test_dataset.label_path : The path to the test labels. Optional Arguments The following arguments are optional to run the command. evaluate.gpu_ids : The GPU indices to run evaluation. Defaults to [0] .

evaluate.num_gpus : The number of GPUs to run evaluation. Defualts to 1 .

evaluate.results_dir : The directory to save the evaluation results. Defaults to /results/evaluate . Multi-GPU evaluation is currently not supported for Pose Classification.

Running Inference on the Model# Use the following command to run inference on PoseClassificationNet. TAO Client (v2 API) INFERENCE_JOB_ID = $( tao pose_classification create-job \ --kind experiment \ --name "pose_classification_inference" \ --action inference \ --workspace-id $WORKSPACE_ID \ --parent-job-id $TRAIN_JOB_ID \ --inference-dataset " $DATASET_ID " \ --specs " $INFERENCE_SPECS " \ --base-experiment-ids '["' $BASE_EXPERIMENT_ID '"]' \ --encryption-key "nvidia_tlt" | jq -r '.id' ) TAO Launcher tao model pose_classification inference [ -h ] -e <experiment_spec> inference.checkpoint = <inference model> inference.output_file = <path to output file> inference.test_dataset.data_path = <path to inference data> [ inference.<infer_option> = <infer_option_value> ] [ inference.gpu_ids = <gpu indices> ] [ inference.num_gpus = <number of gpus> ] Required Arguments The following arguments are required. -e, --experiment_spec : The experiment spec file to set up the inference experiment.

inference.checkpoint : The .pth model to inference.

inference.output_file : The path to the output text file.

inference.test_dataset.data_path : The path to the test data. Optional Arguments** The following arguments are optional to run the command. inference.gpu_ids : The GPU indices to run inference. Defaults to [0] .

inference.num_gpus : The number of GPUs to run inference. Defualts to 1 .

inference.results_dir : The directory to save the inference results. Defaults to /results/inference . The output will be a text file, where each line corresponds to the predicted action class for an input sequence. Multi-GPU inference is currently not supported for Pose Classification. The expected output for the NVIDIA test data would be as follows: sit sit sit_down ...

Exporting the Model# Use the following command to export PoseClassificationNet to .onnx format for deployment: TAO Client (v2 API) EXPORT_JOB_ID = $( tao pose_classification create-job \ --kind experiment \ --name "pose_classification_export" \ --action export \ --workspace-id $WORKSPACE_ID \ --parent-job-id $TRAIN_JOB_ID \ --specs " $EXPORT_SPECS " \ --base-experiment-ids '["' $BASE_EXPERIMENT_ID '"]' \ --encryption-key "nvidia_tlt" | jq -r '.id' ) TAO Launcher tao model pose_classification export -e <experiment_spec> export.checkpoint = <tlt checkpoint to be exported> export.onnx_file = <path to exported file> [ export.gpu_id = <gpu index> ] Required Arguments The following arguments are required. -e, --experiment_spec : The path to an experiment spec file.

export.checkpoint : The .pth model to export.

export.onnx_file : The path where the .etlt or .onnx model is saved. Optional Arguments The following arguments are optional to run the command. export.gpu_id : The index of the GPU used to run export. If the machine has multiple GPUs, you can specify the GPU index used to run export. Note that export can only run on a single GPU.

Converting the Pose Data# Use the following command to convert the output JSON metadata from the deepstream-bodypose-3d app and generate spatio-temporal sequences of body poses for inference: TAO Client (v2 API) DS_CONVERT_JOB_ID = $( tao pose_classification create-job \ --kind dataset \ --dataset-id $DATASET_ID \ --action dataset_convert \ --specs " $SCHEMA " | jq -r '.id' ) TAO Launcher tao model pose_classification dataset_convert -e <experiment_spec> dataset_convert.data = <path to deepstream-bodypose-3d output data> [ dataset_convert.pose_type = <pose type> ] [ dataset_convert.num_joints = <number of joints> ] [ dataset_convert.input_width = <input width> ] [ dataset_convert.input_height = <input height> ] [ dataset_convert.focal_length = <focal length> ] [ dataset_convert.sequence_length_max = <maximum sequence length> ] [ dataset_convert.sequence_length_min = <minimum sequence length> ] [ dataset_convert.sequence_length = <sequence length for sampling> ] [ dataset_convert.sequence_overlap = <sequence overlap for sampling> ] Required Arguments The following arguments are required. -e, --experiment_spec : The experiment spec file to set up dataset conversion

dataset_convert.data : The output JSON data from the deepstream-bodypose-3d app Optional Arguments The following arguments are optional to run the command. dataset_convert.results_dir : The path to a folder where the experiment outputs should be written

dataset_convert.pose_type : The pose type can be chosen from 3dbp , 25dbp , 2dbp

dataset_convert.num_joints : The number of joint points in the graph layout

dataset_convert.input_width : The width of input images in pixels for normalization

dataset_convert.input_height : The height of input images in pixels for normalization

dataset_convert.focal_length : The focal length of the camera for normalization

dataset_convert.sequence_length_max : The maximum sequence length for defining array shape

dataset_convert.sequence_length_min : The minimum sequence length for filtering short sequences

dataset_convert.sequence_length : The general sequence length for sampling

dataset_convert.sequence_overlap : The overlap between sequences for sampling The expected output would be a sampled array for each individual tracked ID saved under the directory for results.