NVIDIA TAO Toolkit v4.0.1
NVIDIA TAO Release 4.0.1

# PointPillars

PointPillars is a model for 3D object detection in point cloud data. Unlike images, point cloud data is in-nature a collection of sparse points in 3D space. Each point cloud sample(example) is called a scene(stored as a file with .bin extension here). For each scene, it contains generally a variable number of points in 3D Euclidean space. The shape of the data in a single scene is hence (N, K), where N, represents the number of points in this scene, is generally a variable positive integer; K is the number of features for each point, and should be 4. So the features of each point can be represented as: (x, y, z, r) , where x, y, z, r represents the X coordinate, Y coordinate, Z coordinate, and reflectance(intensity), respectively. Those numbers are all float-point numbers and reflectance(r) is a real number in the interval of [0.0, 1.0] that represents the intensity(fraction) perceived by LIDAR of a laser beam reflected back at some point in 3D space.

An object in 3D euclidean space can be described as a 3D bounding box. Formally, 3D bounding box can be represented by (x, y, z, dx, dy, dz, yaw). The 7 numbers in the tuple represents the X coordinate of object center, Y coordinate of object center, Z coordinate of object center, length (in X direction), width(in Y direction), height(in Z direction) and orientation in 3D Euclidean space , respectively.

To dealing with coordinates of points and objects, a coordinate system is required. In TAO Toolkit PointPillars, the coordinate system is defined as below:

• Origin of the coordinate system is the center of LIDAR

• X axis is to the front

• Y axis is to the left

• Z axis is to the up

• yaw is the rotation in the horizontal plane(X-Y plane), in counter-clockwise direction. So X axis corresponds to yaw = 0, and Y axis corresponds to yaw = pi / 2, and so on.

A illustration of the coordinate system is shown below.

Copy
Copied!

up z    x front (yaw=0)
^   ^
|  /
| /
(yaw=0.5*pi) left y <------ 0                         


## Preparing the Dataset

The dataset for PointPillars contains point cloud data and the corresponding annotations of 3D objects. The point cloud data is a directory of point cloud files(in .bin extension) and the annotations is a directory of text files in KITTI format.

The directory structure should be organized as below, where the directory name for point cloud files has to be lidar and the directory name for annotations has to be label. The names of the files in the 2 directory can be arbitrary as long as each .bin file has its unique corresponding .txt file and vice-versa.

Copy
Copied!

/lidar
0.bin
1.bin
...
/label
0.txt
1.txt
...


Finally, train/val split has to be maintained for PointPillars as usual. So for both training dataset and validation set we have to ensure they follow the same structure described above. So the overall structure should look like below. The exact name train and val are not required but are preferred by convention.

Copy
Copied!

/train
/lidar
0.bin
1.bin
...
/label
0.txt
1.txt
...
/val
/lidar
0.bin
1.bin
...
/label
0.txt
1.txt
...


Each .bin file should comply with the format described above. Each .txt label file should comply to the KITTI format. There is an exception for PointPillars label format compared to standard KITTI format. Although the structure is the same as KITTI, the last field for each object has different interpretation. In KITTI the last field is Rotation_y(rotation around Y-axis in Camera coordinate system), while in PointPillars they are Rotation_z(rotation around Z-axis in LIDAR coordinate system).

Below is an example, we should interpret -1.59, -2.35, -0.03 differently from standard KITTI.

Copy
Copied!

car 0.00 0 -1.58 587.01 173.33 614.12 200.12 1.65 1.67 3.64 -0.65 1.71 46.70 -1.59
cyclist 0.00 0 -2.46 665.45 160.00 717.93 217.99 1.72 0.47 1.65 2.45 1.35 22.10 -2.35
pedestrian 0.00 2 0.21 423.17 173.67 433.17 224.03 1.60 0.38 0.30 -5.87 1.63 23.11 -0.03


Note

The interpretation of the label of PointPillars is slightly different from standard KITTI format. In PointPillars the yaw is rotation around Z-axis in LIDAR coordinate system, as defined above, while in standard KITTI interpretation the yaw is rotation around Y-axis in Camera coordinate system. In this way, PointPillars dataset does not depend on Camera information and Camera calibration.

Once the above dataset directory structure is ready, copy and paste the base names to spec file â€˜s DATA_CONFIG.DATA_SPLIT dict. For example,

Copy
Copied!

{
'train': train,
'test': val
}


Also, set names to the pickle info files in DATA_CONFIG.INFO_PATH parameter. For example,

Copy
Copied!

{
'train': ['infos_train.pkl'],
'test': ['infos_val.pkl'],
}


Once these are done, the statistics of the dataset should be generated via the dataset_convert command to generate the pickle files above. The pickle files will be used in the data augmentations during training process.

### Converting The Dataset

The pickle info files need to be generated based on the original point cloud files and KITTI text label files. This is accomplished by a command line.

Copy
Copied!



## Evaluating the model

The evaluation metric of PointPillars is mAP(BEV and 3D).

Use the following command to run PointPillars evaluation:

Copy
Copied!

tao pointpillars evaluate -e <experiment_spec_file>
-k <key>
-r <results_dir>
[--trt_engine <trt_engine_file>]
[-h, --help]


### Required Arguments

• -e, --experiment_spec_file: Experiment spec file to set up the evaluation experiment. This should be the same as a training spec file.

• -r, --results_dir: The path to a folder where the experiment outputs should be written.

• -k, --key: The user-specific encoding key to save or load a .tlt model.

### Optional Arguments

• --trt_engine: Path to the TensorRT engine file to load for evaluation.

• -h, --help: Show this help message and exit.

Hereâ€™s an example of using the PointPillars evaluation command:

Copy
Copied!



## Pruning and Retrain a PointPillars Model

TAO PointPillars models supports model pruning. Model pruning reduces model parameters and hence can improve inference frame per second(FPS) on NVIDIA GPUs while maintaining (almost) the same accuracy(mAP).

Pruning is applied to an already trained PointPillars model. The pruning will output a new model with fewer number of parameters in it. Once we have the pruned model, it is necessary to do finetune on the same dataset to bring back the accuracy(mAP). Finetune is simply running training again but with the pruned model as its pretrained model.

Use the following command to run pruning on the PointPillars .tlt model.

Copy
Copied!

tao pointpillars prune -e <experiment_spec_file> \
-r <results_dir> \
-k <key> \
-m <path_to_tlt_model_to_prune> \
-pth <pruning_threshold>


### Required Arguments

• -e, --experiment_spec_file: Experiment spec file to set up the inference experiment. This should be the same as a training spec file.

• -r, --results_dir: The path to a folder where the experiment outputs should be written.

• -k, --key: The user-specific encoding key to save or load a .tlt model.

• -m, --model: The path to the .tlt model to prune.

### Optional Arguments

• -pth, --pruning_thresh: Pruning threshold, should be a float number between 0-1. Defaults to 0.1.

After pruning, the pruned model can be used for retrain(finetune). To start the retrain, we simply provide the path to the pruned model in config file as the parameter OPTIMIZATION.PRUNED_MODEL_PATH and then start the training command as mentioned above.

## Exporting the Model

Use the following command to export PointPillars to .etlt format for deployment:

Copy
Copied!

tao pointpillars export -m <model>
-k <key>
-e <experiment_spec>
[-o <output_file>]
[--data_type {fp32,fp16}]
[--workspace_size <workspace_size>]
[--batch_size <batch_size>]
[--save_engine <engine_file>]
[-h, --help]


### Required Arguments

• -m, --model: The .tlt model to be exported.

• -k, --key: The encoding key of the .tlt model.

• -e, --experiment_spec: Experiment spec file to set up export. Can be the same as the training spec.

### Optional Arguments

• -o, --output_model: The path to save the exported model to. The default is ./<input_file>.etlt.

• -h, --help: Show this help message and exit.

You can use the following optional arguments to save the TRT engine that is generated to verify export:

• -b, --batch_size: The batch size of TensorRT engine. The default value is 1.

• -w, --workspace_size: The workspace size of the TensorRT engine in MB. The default value is 1024, i.e., 1GB.

• --save_engine: The path to the serialized TensorRT engine file. Note that this file is hardware specific and cannot be generalized across GPUs. Useful to quickly test your model accuracy using TensorRT on the host. As the TensorRT engine file is hardware specific, you cannot use this engine file for deployment unless the deployment GPU is identical to the training GPU.

• -t, --data_type: The desired engine data type. The options are fp32 or fp16. The default value is fp32.

Hereâ€™s an example for using the PointPillars export command:

Copy
Copied!

tao pointpillars export -m $TRAINED_TAO_MODEL -e$DEFAULT_SPEC -k \$YOUR_KEY


## Deploying the Model

The PointPillars models that you trained can be deployed on edge devices, such as a Jetson Xavier, Jetson Nano, or Tesla, or in the cloud with NVIDIA GPUs.

DeepStream SDK is currently does not support deployment of PointPillars models. Instead, the PointPillars models can only be deployed via a standalone TensorRT application. A TensorRT sample is developed as a demo to show how to deploy PointPillars models trained in TAO Toolkit.

Note

PointPillars .etlt cannot be parsed by TensorRT directly. You should use tao-converter to convert the .etlt model to optimized TensorRT engine and then integrate the engine into TensorRT sample.

### Using tao-converter

The tao-converter is a tool that is provided with the TAO Toolkit to facilitate the deployment of TAO Toolkit trained models on TensorRT and/or Deepstream. For deployment platforms with an x86 based CPU and discrete GPUs, the tao-converter is distributed within the TAO docker. Therefore, it is suggested to use the docker to generate the engine. However, this requires that the user adhere to the same minor version of TensorRT as distributed with the docker. The TAO docker includes TensorRT version 8.2. In order to use the engine with a different minor version of TensorRT, copy the converter from /opt/nvidia/tools/tao-converter to the target machine and follow the instructions for x86 to run it and generate a TensorRT engine.

For the aarch64 platform, the tao-converter is available to download in the dev zone.

Here is a sample command to generate PointPillars engine through tao-converter:

Copy
Copied!

tao-converter <etlt_model> -k <key_to_etlt_model> -e <path_to_generated_trt_engine> -p points,<points_shapes> -p num_points,<num_points_shapes> -t fp16