PointPillars

PointPillars is a model for 3D object detection in point cloud data. Unlike images, point cloud data is in-nature a collection of sparse points in 3D space. Each point cloud sample(example) is called a scene(stored as a file with .bin extension here). For each scene, it contains generally a variable number of points in 3D Euclidean space. The shape of the data in a single scene is hence (N, K), where N, represents the number of points in this scene, is generally a variable positive integer; K is the number of features for each point, and should be 4. So the features of each point can be represented as: (x, y, z, r) , where x, y, z, r represents the X coordinate, Y coordinate, Z coordinate, and reflectance(intensity), respectively. Those numbers are all float-point numbers and reflectance(r) is a real number in the interval of [0.0, 1.0] that represents the intensity(fraction) perceived by LIDAR of a laser beam reflected back at some point in 3D space.

An object in 3D euclidean space can be described as a 3D bounding box. Formally, 3D bounding box can be represented by (x, y, z, dx, dy, dz, yaw). The 7 numbers in the tuple represents the X coordinate of object center, Y coordinate of object center, Z coordinate of object center, length (in X direction), width(in Y direction), height(in Z direction) and orientation in 3D Euclidean space , respectively.

To dealing with coordinates of points and objects, a coordinate system is required. In TAO Toolkit PointPillars, the coordinate system is defined as below:

  • Origin of the coordinate system is the center of LIDAR

  • X axis is to the front

  • Y axis is to the left

  • Z axis is to the up

  • yaw is the rotation in the horizontal plane(X-Y plane), in counter-clockwise direction. So X axis corresponds to yaw = 0, and Y axis corresponds to yaw = pi / 2, and so on.

A illustration of the coordinate system is shown below.

                         up z    x front (yaw=0)
                            ^   ^
                            |  /
                            | /
(yaw=0.5*pi) left y <------ 0

Preparing the Dataset

The dataset for PointPillars contains point cloud data and the corresponding annotations of 3D objects. The point cloud data is a directory of point cloud files(in .bin extension) and the annotations is a directory of text files in KITTI format.

The directory structure should be organized as below, where the directory name for point cloud files has to be lidar and the directory name for annotations has to be label. The names of the files in the 2 directory can be arbitrary as long as each .bin file has its unique corresponding .txt file and vice-versa.

/lidar
  0.bin
  1.bin
  ...
/label
  0.txt
  1.txt
  ...

Finally, train/val split has to be maintained for PointPillars as usual. So for both training dataset and validation set we have to ensure they follow the same structure described above. So the overall structure should look like below. The exact name train and val are not required but are preferred by convention.

/train
  /lidar
    0.bin
    1.bin
    ...
  /label
    0.txt
    1.txt
    ...
/val
  /lidar
    0.bin
    1.bin
    ...
  /label
    0.txt
    1.txt
    ...

Each .bin file should comply with the format described above. Each .txt label file should comply to the KITTI format. There is an exception for PointPillars label format compared to standard KITTI format. Although the structure is the same as KITTI, the last field for each object has different interpretation. In KITTI the last field is Rotation_y(rotation around Y-axis in Camera coordinate system), while in PointPillars they are Rotation_z(rotation around Z-axis in LIDAR coordinate system).

Below is an example, we should interpret -1.59, -2.35, -0.03 differently from standard KITTI.

car 0.00 0 -1.58 587.01 173.33 614.12 200.12 1.65 1.67 3.64 -0.65 1.71 46.70 -1.59
cyclist 0.00 0 -2.46 665.45 160.00 717.93 217.99 1.72 0.47 1.65 2.45 1.35 22.10 -2.35
pedestrian 0.00 2 0.21 423.17 173.67 433.17 224.03 1.60 0.38 0.30 -5.87 1.63 23.11 -0.03

Note

The interpretation of the label of PointPillars is slightly different from standard KITTI format. In PointPillars the yaw is rotation around Z-axis in LIDAR coordinate system, as defined above, while in standard KITTI interpretation the yaw is rotation around Y-axis in Camera coordinate system. In this way, PointPillars dataset does not depend on Camera information and Camera calibration.

Once the above dataset directory structure is ready, copy and paste the base names to spec file ‘s DATA_CONFIG.DATA_SPLIT dict. For example,

{
  'train': train,
  'test': val
}

Also, set names to the pickle info files in DATA_CONFIG.INFO_PATH parameter. For example,

{
  'train': ['infos_train.pkl'],
  'test': ['infos_val.pkl'],
}

Once these are done, the statistics of the dataset should be generated via the dataset_convert command to generate the pickle files above. The pickle files will be used in the data augmentations during training process.

Converting The Dataset

The pickle info files need to be generated based on the original point cloud files and KITTI text label files. This is accomplished by a command line.

tao pointpillars dataset_convert -e $SPECS_DIR/pointpillars.yaml

The -e provides spec file for training, see below.

Creating an Experiment Spec File

The spec file for PointPillars includes the CLASS_NAMES, DATA_CONFIG, MODEL, OPTIMIZATION, EVALUATION, and INFERENCE parameters. Below is an example spec file for training on the KITTI dataset.

CLASS_NAMES: ['Car', 'Pedestrian', 'Cyclist']
DATA_CONFIG:
    DATASET: 'GeneralPCDataset'
    DATA_PATH: '/path/to/pointpillar'
    DATA_SPLIT: {
        'train': train,
        'test': val
    }
    INFO_PATH: {
        'train': [infos_train.pkl],
        'test': [infos_val.pkl],
    }
    BALANCED_RESAMPLING: False
    POINT_FEATURE_ENCODING: {
        encoding_type: absolute_coordinates_encoding,
        used_feature_list: ['x', 'y', 'z', 'intensity'],
        src_feature_list: ['x', 'y', 'z', 'intensity'],
    }
    POINT_CLOUD_RANGE: [0, -39.68, -3, 69.12, 39.68, 1]
    DATA_AUGMENTOR:
        DISABLE_AUG_LIST: ['placeholder']
        AUG_CONFIG_LIST:
            - NAME: gt_sampling
              DB_INFO_PATH:
                  - dbinfos_train.pkl
              PREPARE: {
                filter_by_min_points: ['Car:5', 'Pedestrian:5', 'Cyclist:5'],
              }
              SAMPLE_GROUPS: ['Car:15','Pedestrian:15', 'Cyclist:15']
              NUM_POINT_FEATURES: 4
              DATABASE_WITH_FAKELIDAR: False
              REMOVE_EXTRA_WIDTH: [0.0, 0.0, 0.0]
              LIMIT_WHOLE_SCENE: False
            - NAME: random_world_flip
              ALONG_AXIS_LIST: ['x']
            - NAME: random_world_rotation
              WORLD_ROT_ANGLE: [-0.78539816, 0.78539816]
            - NAME: random_world_scaling
              WORLD_SCALE_RANGE: [0.95, 1.05]
    DATA_PROCESSOR:
        - NAME: mask_points_and_boxes_outside_range
          REMOVE_OUTSIDE_BOXES: True
        - NAME: shuffle_points
          SHUFFLE_ENABLED: {
              'train': True,
              'test': False
          }
        - NAME: transform_points_to_voxels
          VOXEL_SIZE: [0.16, 0.16, 4]
          MAX_POINTS_PER_VOXEL: 32
          MAX_NUMBER_OF_VOXELS: {
              'train': 16000,
              'test': 10000
          }
    NUM_WORKERS: 4

MODEL:
    NAME: PointPillar
    VFE:
        NAME: PillarVFE
        WITH_DISTANCE: False
        USE_ABSLOTE_XYZ: True
        USE_NORM: True
        NUM_FILTERS: [64]
    MAP_TO_BEV:
        NAME: PointPillarScatter
        NUM_BEV_FEATURES: 64
    BACKBONE_2D:
        NAME: BaseBEVBackbone
        LAYER_NUMS: [3, 5, 5]
        LAYER_STRIDES: [2, 2, 2]
        NUM_FILTERS: [64, 128, 256]
        UPSAMPLE_STRIDES: [1, 2, 4]
        NUM_UPSAMPLE_FILTERS: [128, 128, 128]
    DENSE_HEAD:
        NAME: AnchorHeadSingle
        CLASS_AGNOSTIC: False
        USE_DIRECTION_CLASSIFIER: True
        DIR_OFFSET: 0.78539
        DIR_LIMIT_OFFSET: 0.0
        NUM_DIR_BINS: 2
        ANCHOR_GENERATOR_CONFIG: [
            {
                'class_name': 'Car',
                'anchor_sizes': [[3.9, 1.6, 1.56]],
                'anchor_rotations': [0, 1.57],
                'anchor_bottom_heights': [-1.78],
                'align_center': False,
                'feature_map_stride': 2,
                'matched_threshold': 0.6,
                'unmatched_threshold': 0.45
            },
            {
                'class_name': 'Pedestrian',
                'anchor_sizes': [[0.8, 0.6, 1.73]],
                'anchor_rotations': [0, 1.57],
                'anchor_bottom_heights': [-0.6],
                'align_center': False,
                'feature_map_stride': 2,
                'matched_threshold': 0.5,
                'unmatched_threshold': 0.35
            },
            {
                'class_name': 'Cyclist',
                'anchor_sizes': [[1.76, 0.6, 1.73]],
                'anchor_rotations': [0, 1.57],
                'anchor_bottom_heights': [-0.6],
                'align_center': False,
                'feature_map_stride': 2,
                'matched_threshold': 0.5,
                'unmatched_threshold': 0.35
            }
        ]
        TARGET_ASSIGNER_CONFIG:
            NAME: AxisAlignedTargetAssigner
            POS_FRACTION: -1.0
            SAMPLE_SIZE: 512
            NORM_BY_NUM_EXAMPLES: False
            MATCH_HEIGHT: False
            BOX_CODER: ResidualCoder
        LOSS_CONFIG:
            LOSS_WEIGHTS: {
                'cls_weight': 1.0,
                'loc_weight': 2.0,
                'dir_weight': 0.2,
                'code_weights': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
            }
    POST_PROCESSING:
        RECALL_THRESH_LIST: [0.3, 0.5, 0.7]
        SCORE_THRESH: 0.1
        OUTPUT_RAW_SCORE: False
        EVAL_METRIC: kitti
        NMS_CONFIG:
            MULTI_CLASSES_NMS: False
            NMS_TYPE: nms_gpu
            NMS_THRESH: 0.01
            NMS_PRE_MAXSIZE: 4096
            NMS_POST_MAXSIZE: 500
    SYNC_BN: False

OPTIMIZATION:
    BATCH_SIZE_PER_GPU: 4
    NUM_EPOCHS: 80
    OPTIMIZER: adam_onecycle
    LR: 0.003
    WEIGHT_DECAY: 0.01
    MOMENTUM: 0.9
    MOMS: [0.95, 0.85]
    PCT_START: 0.4
    DIV_FACTOR: 10
    DECAY_STEP_LIST: [35, 45]
    LR_DECAY: 0.1
    LR_CLIP: 0.0000001
    LR_WARMUP: False
    WARMUP_EPOCH: 1
    GRAD_NORM_CLIP: 10
    RESUME_MODEL_PATH: null
    PRETRAINED_MODEL_PATH: null
    PRUNED_MODEL_PATH: null
    TCP_PORT: 18888
    RANDOM_SEED: null
    CKPT_INTERVAL: 1
    MAX_CKPT_SAVE_NUM: 30
    MERGE_ALL_ITERS_TO_ONE_EPOCH: False

EVALUATION:
    BATCH_SIZE: 1
    CKPT: "/path/to/checkpoint_epoch_80.tlt"

INFERENCE:
    MAX_POINTS_NUM: 25000
    BATCH_SIZE: 1
    CKPT: "/path/to/checkpoint_epoch_80.tlt"
    VIS_CONF_THRESH: 0.1

The top level description of the spec file is provided in the table below.

Parameter

Data Type

Default

Description

CLASS_NAMES

list of strings

The list of class names in dataset

DATA_PATH

string

The path to the dataset

DATA_SPLIT

dict

The dict that maps train and test splits to actual directory name

INFO_PATH

dict

The dict that maps train and test splits to actual pickle info name

BALANCED_RESAMPLING

bool

False

Whether or not to enable balanced resampling in data loader

POINT_FEATURE_ENCODING

Collection

The configuration for point feature encoding

POINT_FEATURE_ENCODING

Collection

The configuration for point feature encoding

POINT_CLOUD_RANGE

list of floats

The point cloud coordinates range in [xmin, ymin, zmin, xmax, ymax, zmax] format

DATA_AUGMENTOR

Collection

The configuration for data augmentation

DATA_PROCESSOR

Collection

The configuration for data processing

NUM_WORKERS

int

1

The number of workers used for data loader

Class Names

The CLASS_NAMES parameter provides the list of object class names in the dataset. It is simply a list of strings.

Dataset

The DATA_CONFIG parameter defines the dataset for training and validation/evaluation of the PointPillars model, described below.

Parameter

Data Type

Default

Description

DATASET

string

GeneralPCDataset

The name(type) of the dataset type, currently only GeneralPCDataset is supported

DATA_CONFIG

Collection

The configuration of the dataset

MODEL

Collection

The configuration of the PointPillars model

OPTIMIZATION

Collection

The configuration of the training process

INFERENCE

Collection

The configuration for the inference process

EVALUATION

Collection

The configuration for the evaluation process

Point Feature Encoding

Point feature encoding defines how the features of each point are represented. This parameter is fixed for this version and has to be:

{
  encoding_type: absolute_coordinates_encoding,
  used_feature_list: ['x', 'y', 'z', 'intensity'],
  src_feature_list: ['x', 'y', 'z', 'intensity'],
}

Data Augmentations

Data augmentation pipelines are defined by the parameter DATA_AUGMENTOR. See table below.

Parameter

Data Type

Default

Description

DISABLE_AUG_LIST

List of strings

["placeholder"]

The list of augmentations to be disabled

AUG_CONFIG_LIST

List of collections

The list of augmentations, whose name should be gt_sampling, random_world_flip, random_world_rotation, random_world_scaling, in that order

The parameters for gt_sampling is provided below.

Parameter

Data Type

Default

Description

NAME

string

gt_sampling

The name, has to be gt_sampling

DB_INFO_PATH

List of strings

dbinfos_train.pkl

The list of db infos for sampling

PREFACE

dict

Preface of the gt sampling

SAMPLE_GROUPS

List of strings

list of strings to provide per-class sample groups

NUM_POINT_FEATURES

int

4

Number of features for each point

DATABASE_WITH_FAKELIDAR

bool

False

Whether the fake LIDAR is enabled

REMOVE_EXTRA_WIDTH

list of floats

Extra widths to remove per-class

LIMIT_WHOLE_SCENE

bool

False

Whether or not to limit whole scene

The parameters for random_world_flip are described below.

Parameter

Data Type

Default

Description

ALONG_AXIS_LIST

List of string

The axes along which to flip the coordinates

The parameters for random_world_rotation are described below.

Parameter

Data Type

Default

Description

WORLD_ROT_ANGLE

List of floats

The maximum angles to rotate

The parameters for random_world_scaling are described below.

Parameter

Data Type

Default

Description

WORLD_SCALE_RANGE

List of floats

The minimum and maximum scaling factors

Data Processing

The dataset processing is defined by the DATA_PROCESSOR parameter.

Parameter

Data Type

Default

Description

DATA_PROCESSOR

List of collections

The list of data processing, should include mask_points_and_boxes_outside_range, shuffle_points, transform_points_to_voxels, in that order

The parameters for mask_points_and_boxes_outside_range are described below.

Parameter

Data Type

Default

Description

NAME

string

mask_points_and_boxes_outside_range

The name, has to be mask_points_and_boxes_outside_range

REMOVE_OUTSIDE_BOXES

bool

True

Whether or not to remove outside boxes

The parameters for shuffle_points are described below.

Parameter

Data Type

Default

Description

NAME

string

shuffle_points

The name, has to be shuffle_points

SHUFFLE_ENABLED

dict

{'train': True, 'test': False}

Dict to enable/disable shuffling for train/val datasets

The parameters for transform_points_to_voxels are described below.

Parameter

Data Type

Default

Description

NAME

string

transform_points_to_voxels

The name, has to be transform_points_to_voxels

VOXEL_SIZE

List of floats

Voxel size in the format [dx, dy, dz]

MAX_POINTS_PER_VOXEL

int

32

Maximum number of points per voxel

MAX_NUMBER_OF_VOXELS

dict

Dict that provides the maximum number of voxels in training and test/validation mode

Model Architecture

The PointPillars model architecture is defines in the parameter MODEL, detailed in table below.

Parameter

Data Type

Default

Description

NAME

string

PointPillar

The name, has to be PointPillar

VFE

Collection

Definition of the voxel feature extractor

MAP_TO_BEV

Collection

Definition of the scatter module

BACKBONE_2D

Collection

Definition of the 2D backbone

DENSE_HEAD

Collection

Definition of the dense head

POST_PROCESSING

Collection

Post-processing

SYNC_BN

bool

False

Enable sync-BN or not

Voxel Feature extractor

The voxel feature extractor is configured by the parameter VFE, described below.

Parameter

Data Type

Default

Description

NAME

string

PillarVFE

The name, has to be PillarVFE

WITH_DISTANCE

bool

False

With distance or not

USE_ABSLOTE_XYZ

bool

True

Use absolute XYZ coordinates or not

USE_NORM

bool

True

Use normalization or not

NUM_FILTERS

List of int

64

Number of filters

Scatter

The scattering process is configured by the parameter MAP_TO_BEV, described below.

Parameter

Data Type

Default

Description

NAME

string

PointPillarScatter

The name, has to be PointPillarScatter

NUM_BEV_FEATURES

int

64

Number of features for bird’s eye view

2D backbone

The 2D backbone is configured by the parameter BACKBONE_2D, described below.

Parameter

Data Type

Default

Description

NAME

string

BaseBEVBackbone

The name, has to be BaseBEVBackbone

LAYER_NUMS

List of ints

[3, 5, 5]

Numbers of layers

LAYER_STRIDES

List of ints

[2, 2, 2]

The number of strides

NUM_FILTERS

List of ints

[64, 128, 256]

The numbers of filters

UPSAMPLE_STRIDES

List of ints

[1, 2, 4]

The upsampling strides

NUM_UPSAMPLE_FILTERS

List of ints

[128, 128, 128]

The numbers of upsampling filters

Dense Head

The dense head is configured by the parameter DENSE_HEAD, described below.

Parameter

Data Type

Default

Description

NAME

string

AnchorHeadSingle

The name, has to be AnchorHeadSingle

CLASS_AGNOSTIC

bool

False

Class agnostic or not

USE_DIRECTION_CLASSIFIER

bool

True

Use direction classifier or not

DIR_OFFSET

float

0.78539

Direction offset

DIR_LIMIT_OFFSET

float

0.0

Direction limit offset

NUM_DIR_BINS

int

2

The numbers of direction bins

ANCHOR_GENERATOR_CONFIG

List of dict

The config for per-class anchor generator

TARGET_ASSIGNER_CONFIG

Collection

Config for target assigner

LOSS_CONFIG

Collection

Config for loss function

The parameters of ANCHOR_GENERATOR_CONFIG is a list of dicts. Each dict follows the same format, described below.

{
  'class_name': 'Car',
  'anchor_sizes': [[3.9, 1.6, 1.56]],
  'anchor_rotations': [0, 1.57],
  'anchor_bottom_heights': [-1.78],
  'align_center': False,
  'feature_map_stride': 2,
  'matched_threshold': 0.6,
  'unmatched_threshold': 0.45
}

The parameters of TARGET_ASSIGNER_CONFIG are described below.

Parameter

Data Type

Default

Description

NAME

string

AxisAlignedTargetAssigner

The name, has to be AxisAlignedTargetAssigner

POS_FRACTION

float

-1.0

Positive fraction

SAMPLE_SIZE

int

512

Sample size

NORM_BY_NUM_EXAMPLES

bool

False

Normalize by number of examples or not

MATCH_HEIGHT

bool

False

Match height or not

BOX_CODER

string

ResidualCoder

The name of the box coder

The parameters for LOSS_CONFIG are described below.

Parameter

Data Type

Default

Description

LOSS_WEIGHTS

dict

The dict to provide loss weighting factors

The LOSS_WEIGHTS dict should be in the format below.

{
  'cls_weight': 1.0,
  'loc_weight': 2.0,
  'dir_weight': 0.2,
  'code_weights': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
}

Post Processing

The post-processing is defined in the parameter POST_PROCESSING, described below.

Parameter

Data Type

Default

Description

RECALL_THRESH_LIST

List of floats

The dict to provide loss weighting factors

SCORE_THRESH

float

0.1

The score threshold

OUTPUT_RAW_SCORE

bool

False

Output raw score or not

EVAL_METRIC

string

kitti

The evaluation metric, only kitti is supported

NMS_CONFIG

Collection

The NMS config

The Non-Maximum Suppression(NMS) is configured by the NMS_CONFIG parameter, described below.

Parameter

Data Type

Default

Description

MULTI_CLASSES_NMS

bool

False

Multi-class NMS or not

NMS_TYPE

string

nms_gpu

The NMS type

NMS_THRESH

float

0.01

The NMS IoU threshold

NMS_PRE_MAXSIZE

int

Pre-NMS maximum number of boxes

NMS_POST_MAXSIZE

int

Post-NMS maximum number of boxes

Training Process

The OPTIMIZATION parameter defines the hyper-parameters of the training process.

Parameter

Datatype

Default

Description

Supported Values

BATCH_SIZE_PER_GPU

int

4

The batch size per GPU

>=1

NUM_EPOCHS

int

80

The number of epochs to train the model

>=1

OPTIMIZER

string

adam_onecycle

The optimizer name(type)

adam_onecycle

LR

float

0.003

The initial learning rate

>0.0

WEIGHT_DECAY

float

0.01

Weight decay

>0.0

MOMENTUM

float

0.9

Momentum for SGD optimizer

>0, <1

MOMS

List of floats

[0.95, 0.85]

Momentums for One Cycle learning rate scheduler

[0.95, 0.85]

PCT_START

float

0.4

The percentage of the cycle spent increasing the learning rate

0.4

DIV_FACTOR

float

10.0

Division factor

10.0

DECAY_STEP_LIST

list of ints

[35, 45]

The list of epoch number on which to decay learning rate

list whose elements < NUM_EPOCHS

LR_DECAY

float

0.1

The decay of learning rate

>0, <1

LR_CLIP

float

0.0000001

Minimum value of learning rate

>0, <1

LR_WARMUP

bool

False

Enable learning rate warm up or not

True/False

WARMUP_EPOCH

int

1

Number of epochs to warm up learning rate

>=1

GRAD_NORM_CLIP

float

10.0

The limit to apply gradient norm clip

>0

RESUME_MODEL_PATH

string

The path of model to resume training

Unix path

PRETRAINED_MODEL_PATH

string

The path to the pretrained model

Unix path

PRUNED_MODEL_PATH

string

The path to the pruned model for retrain

Unix path

TCP_PORT

int

18888

TCP port for multi-gpu training

18888

RANDOM_SEED

int

Random seed

integer

CKPT_INTERVAL

int

1

Interval of epochs to save checkpoints

>=1

MAX_CKPT_SAVE_NUM

int

1

The maximum number of checkpoints to save

>=1

MERGE_ALL_ITERS_TO_ONE_EPOCH

bool

False

Merge all training steps in one epoch or not

False

Evaluation

The EVALUATION parameter defines the hyper-parameters of the evaluation process. The metric of evaluation is mAP(3D and BEV).

Parameter

Datatype

Default/Suggested value

Description

Supported Values

BATCH_SIZE

int

1

The batch size of evaluation

>=1

CKPT

string

The path to the model to run evaluation

Unix path

Inference

The INFERENCE parameter defines the hyper-parameters of the inference process. Inference will draw bounding boxes and visualize it on images.

Parameter

Datatype

Default/Suggested value

Description

Supported Values

BATCH_SIZE

int

1

The batch size of inference

>=1

CKPT

string

The path to the model to run inference

Unix path

MAX_POINTS_NUM

int

Maximum number of points in a point cloud file

>=1

VIS_CONF_THRESH

float

0.1

Visualization confidence threshold

>0, <1

Training the Model

Use the following command to run PointPillars training:

tao pointpillars train -e <experiment_spec_file>
                       -r <results_dir>
                       -k <key>
                       [--gpus <num_gpus>]
                       [-h, --help]

Required Arguments

  • -e, --experiment_spec_file: The path to the experiment spec file

  • -r, --results_dir: The path to a folder where the experiment outputs should be written.

  • -k, --key: The user-specific encoding key to save or load a .tlt model.

Optional Arguments

  • --gpus: The number of GPUs to be used in the training in a multi-GPU scenario (default: 1).

  • -h, --help: Show this help message and exit.

Here’s an example of using the PointPillars training command:

tao pointpillars train -e $DEFAULT_SPEC -r $RESULTS_DIR -k $YOUR_KEY

Evaluating the model

The evaluation metric of PointPillars is mAP(BEV and 3D).

Use the following command to run PointPillars evaluation:

tao pointpillars evaluate -e <experiment_spec_file>
                          -k <key>
                          -r <results_dir>
                          [--trt_engine <trt_engine_file>]
                          [-h, --help]

Required Arguments

  • -e, --experiment_spec_file: Experiment spec file to set up the evaluation experiment. This should be the same as a training spec file.

  • -r, --results_dir: The path to a folder where the experiment outputs should be written.

  • -k, --key: The user-specific encoding key to save or load a .tlt model.

Optional Arguments

  • --trt_engine: Path to the TensorRT engine file to load for evaluation.

  • -h, --help: Show this help message and exit.

Here’s an example of using the PointPillars evaluation command:

tao pointpillars evaluate -e $DEFAULT_SPEC -r $RESULTS_DIR -k $YOUR_KEY

Note

The evaluation metric in TAO PointPillars is different from that in official metric of KITTI point cloud detection. While KITTI metric considers easy/moderate/hard categories of objects and filters small objects whose sizes are smaller than a threshold, it is only meaningful for KITTI dataset. Instead, TAO PointPillars metric is a general metric that does not classify objects into easy/moderate/hard categories and does not exclude objects in calculation of metric. This makes TAO PointPillars metric a general metric that is applicable to a general dataset. The final result is average precision(AP) and mean average precision(mAP) regardless of its details in computation. Due to this, the TAO PointPillars metric is not comparable with KITTI official metric on KITTI dataset, although they should be roughly the same.

Running Inference on the PointPillars Model

Use the following command to run inference on PointPillars with .tlt model or TensorRT engine:

tao pointpillars evaluate -e <experiment_spec_file>
                          -k <key>
                          -r <results_dir>
                          [--trt_engine <trt_engine_file>]
                          [-h, --help]

Required Arguments

  • -e, --experiment_spec_file: Experiment spec file to set up the inference experiment. This should be the same as a training spec file.

  • -r, --results_dir: The path to a folder where the experiment outputs should be written.

  • -k, --key: The user-specific encoding key to save or load a .tlt model.

Optional Arguments

  • --trt_engine: Path to the TensorRT engine file to load for inference.

  • -h, --help: Show this help message and exit.

Here’s an example of using the PointPillars inference command:

tao pointpillars inference -e $DEFAULT_SPEC -r $RESULTS_DIR -k $YOUR_KEY

Pruning and Retrain a PointPillars Model

TAO PointPillars models supports model pruning. Model pruning reduces model parameters and hence can improve inference frame per second(FPS) on NVIDIA GPUs while maintaining (almost) the same accuracy(mAP).

Pruning is applied to an already trained PointPillars model. The pruning will output a new model with fewer number of parameters in it. Once we have the pruned model, it is necessary to do finetune on the same dataset to bring back the accuracy(mAP). Finetune is simply runing training again but with the pruned model as its pretrained model.

Use the following command to run pruning on the PointPillars .tlt model.

tao pointpillars prune -e <experiment_spec_file> \
                       -r <results_dir> \
                       -k <key> \
                       -m <path_to_tlt_model_to_prune> \
                       -pth <pruning_threshold>

Required Arguments

  • -e, --experiment_spec_file: Experiment spec file to set up the inference experiment. This should be the same as a training spec file.

  • -r, --results_dir: The path to a folder where the experiment outputs should be written.

  • -k, --key: The user-specific encoding key to save or load a .tlt model.

  • -m, --model: The path to the .tlt model to prune.

Optional Arguments

  • -pth, --pruning_thresh: Pruning threshold, should be a float number between 0-1. Defaults to 0.1.

After pruning, the pruned model can be used for retrain(finetune). To start the retrain, we simply provide the path to the pruned model in config file as the parameter OPTIMIZATION.PRUNED_MODEL_PATH and then start the training command as mentioned above.

Exporting the Model

Use the following command to export PointPillars to .etlt format for deployment:

tao pointpillars export -m <model>
                  -k <key>
                  -e <experiment_spec>
                  [-o <output_file>]
                  [--data_type {fp32,fp16}]
                  [--workspace_size <workspace_size>]
                  [--batch_size <batch_size>]
                  [--save_engine <engine_file>]
                  [-h, --help]

Required Arguments

  • -m, --model: The .tlt model to be exported.

  • -k, --key: The encoding key of the .tlt model.

  • -e, --experiment_spec: Experiment spec file to set up export. Can be the same as the training spec.

Optional Arguments

  • -o, --output_model: The path to save the exported model to. The default is ./<input_file>.etlt.

  • -h, --help: Show this help message and exit.

You can use the following optional arguments to save the TRT engine that is generated to verify export:

  • -b, --batch_size: The batch size of TensorRT engine. The default value is 1.

  • -w, --workspace_size: The workspace size of the TensorRT engine in MB. The default value is 1024, i.e., 1GB.

  • --save_engine: The path to the serialized TensorRT engine file. Note that this file is hardware specific and cannot be generalized across GPUs. Useful to quickly test your model accuracy using TensorRT on the host. As the TensorRT engine file is hardware specific, you cannot use this engine file for deployment unless the deployment GPU is identical to the training GPU.

  • -t, --data_type: The desired engine data type. The options are fp32 or fp16. The default value is fp32.

Here’s an example for using the PointPillars export command:

tao pointpillars export -m $TRAINED_TAO_MODEL -e $DEFAULT_SPEC -k $YOUR_KEY

Deploying the Model

The PointPillars models that you trained can be deployed on edge devices, such as a Jetson Xavier, Jetson Nano, or Tesla, or in the cloud with NVIDIA GPUs.

DeepStream SDK is currently does not support deployment of PointPillars models. Instead, the PointPillars models can only be deployed via a standalone TensorRT application. A TensorRT sample is developed as a demo to show how to deploy PointPillars models trained in TAO Toolkit.

Note

PointPillars .etlt cannot be parsed by TensorRT directly. You should use tao-converter to convert the .etlt model to optimized TensorRT engine and then integrate the engine into TensorRT sample.

Using tao-converter

The tao-converter is a tool that is provided with the TAO Toolkit to facilitate the deployment of TAO Toolkit trained models on TensorRT and/or Deepstream. For deployment platforms with an x86 based CPU and discrete GPUs, the tao-converter is distributed within the TAO docker. Therefore, it is suggested to use the docker to generate the engine. However, this requires that the user adhere to the same minor version of TensorRT as distributed with the docker. The TAO docker includes TensorRT version 8.2. In order to use the engine with a different minor version of TensorRT, copy the converter from /opt/nvidia/tools/tao-converter to the target machine and follow the instructions for x86 to run it and generate a TensorRT engine.

For the aarch64 platform, the tao-converter is available to download in the dev zone.

Here is a sample command to generate PointPillars engine through tao-converter:

tao-converter <etlt_model> -k <key_to_etlt_model> -e <path_to_generated_trt_engine> -p points,<points_shapes> -p num_points,<num_points_shapes> -t fp16