PointPillars
PointPillars is a model for 3D object detection in point cloud data. Unlike images, point cloud
data is in-nature a collection of sparse points in 3D space. Each point cloud sample(example)
is called a scene(stored as a file with .bin
extension here). For each scene, it contains
generally a variable number of points in 3D Euclidean space. The shape of the data in a single
scene is hence (N, K)
, where N
, represents the number of points in this scene,
is generally a variable positive integer; K
is the number of features for each point, and
should be 4. So the features of each point can be represented as: (x, y, z, r)
, where x, y, z, r
represents the X coordinate, Y coordinate,
Z coordinate, and reflectance(intensity), respectively. Those numbers are all float-point
numbers and reflectance(r
) is a real number in the interval of [0.0, 1.0]
that represents
the intensity(fraction) perceived by LIDAR of a laser beam reflected back at some point in 3D space.
An object in 3D euclidean space can be described as a 3D bounding box. Formally, 3D bounding box
can be represented by (x, y, z, dx, dy, dz, yaw)
. The 7 numbers in the tuple represents the
X coordinate of object center, Y coordinate of object center, Z coordinate of object center, length
(in X direction), width(in Y direction), height(in Z direction) and orientation in 3D Euclidean space
, respectively.
To dealing with coordinates of points and objects, a coordinate system is required. In TAO PointPillars, the coordinate system is defined as below:
Origin of the coordinate system is the center of LIDAR
X axis is to the front
Y axis is to the left
Z axis is to the up
yaw is the rotation in the horizontal plane(X-Y plane), in counter-clockwise direction. So X axis corresponds to
yaw = 0
, and Y axis corresponds toyaw = pi / 2
, and so on.
A illustration of the coordinate system is shown below.
up z x front (yaw=0)
^ ^
| /
| /
(yaw=0.5*pi) left y <------ 0
The dataset for PointPillars contains point cloud data and the corresponding annotations of 3D objects. The point cloud data is a directory of point cloud files(in .bin extension) and the annotations is a directory of text files in KITTI format.
The directory structure should be organized as below, where the directory name for point cloud files
has to be lidar
and the directory name for annotations has to be label
. The names of
the files in the 2 directory can be arbitrary as long as each .bin
file has its unique corresponding
.txt
file and vice-versa.
/lidar
0.bin
1.bin
...
/label
0.txt
1.txt
...
Finally, train/val split has to be maintained for PointPillars as usual. So for both training dataset
and validation set we have to ensure they follow the same structure described above. So the overall
structure should look like below. The exact name train
and val
are not required but
are preferred by convention.
/train
/lidar
0.bin
1.bin
...
/label
0.txt
1.txt
...
/val
/lidar
0.bin
1.bin
...
/label
0.txt
1.txt
...
Each .bin
file should comply with the format described above. Each .txt
label file
should comply to the KITTI format. There is an exception for PointPillars label format compared to
standard KITTI format. Although the structure is the same as KITTI, the last field for each object
has different interpretation. In KITTI the last field is Rotation_y(rotation around Y-axis in Camera
coordinate system), while in PointPillars they are Rotation_z(rotation around Z-axis in LIDAR coordinate
system).
Below is an example, we should interpret -1.59, -2.35, -0.03
differently from standard KITTI.
car 0.00 0 -1.58 587.01 173.33 614.12 200.12 1.65 1.67 3.64 -0.65 1.71 46.70 -1.59
cyclist 0.00 0 -2.46 665.45 160.00 717.93 217.99 1.72 0.47 1.65 2.45 1.35 22.10 -2.35
pedestrian 0.00 2 0.21 423.17 173.67 433.17 224.03 1.60 0.38 0.30 -5.87 1.63 23.11 -0.03
The interpretation of the label of PointPillars is slightly different from standard KITTI format. In PointPillars the yaw is rotation around Z-axis in LIDAR coordinate system, as defined above, while in standard KITTI interpretation the yaw is rotation around Y-axis in Camera coordinate system. In this way, PointPillars dataset does not depend on Camera information and Camera calibration.
Once the above dataset directory structure is ready, copy and paste the base names to spec file
‘s dataset.data_split
dict. For example,
{
'train': train,
'test': val
}
Also, set names to the pickle info files in dataset.info_path
parameter. For example,
{
'train': ['infos_train.pkl'],
'test': ['infos_val.pkl'],
}
Once these are done, the statistics of the dataset should be generated via the dataset_convert
command to generate the pickle files above. The pickle files will be used in the data augmentations
during training process.
Converting The Dataset
The pickle info files need to be generated based on the original point cloud files and KITTI text label files. This is accomplished by a command line.
tao model pointpillars dataset_convert -e $SPECS_DIR/pointpillars.yaml
The -e
provides spec file for training, see below.
The spec file for PointPillars includes the dataset
,
model
, train
, evaluate
, inference
, export
and
prune
parameters. Below is an example spec file for training on the KITTI dataset.
dataset:
class_names: ['Car', 'Pedestrian', 'Cyclist']
type: 'GeneralPCDataset'
data_path: '/path/to/tao-experiments/data/pointpillars'
data_split: {
'train': train,
'test': val
}
info_path: {
'train': [infos_train.pkl],
'test': [infos_val.pkl],
}
balanced_resampling: False
point_feature_encoding: {
encoding_type: absolute_coordinates_encoding,
used_feature_list: ['x', 'y', 'z', 'intensity'],
src_feature_list: ['x', 'y', 'z', 'intensity'],
}
point_cloud_range: [0, -39.68, -3, 69.12, 39.68, 1]
data_augmentor:
disable_aug_list: ['placeholder']
aug_config_list:
- name: gt_sampling
db_info_path:
- dbinfos_train.pkl
preface: {
filter_by_min_points: ['Car:5', 'Pedestrian:5', 'Cyclist:5'],
}
sample_groups: ['Car:15','Pedestrian:15', 'Cyclist:15']
num_point_features: 4
disable_with_fake_lidar: False
remove_extra_width: [0.0, 0.0, 0.0]
limit_whole_scene: False
- name: random_world_flip
along_axis_list: ['x']
- name: random_world_rotation
world_rot_angle: [-0.78539816, 0.78539816]
- name: random_world_scaling
world_scale_range: [0.95, 1.05]
data_processor:
- name: mask_points_and_boxes_outside_range
remove_outside_boxes: True
- name: shuffle_points
shuffle: {
'train': True,
'test': False
}
- name: transform_points_to_voxels
voxel_size: [0.16, 0.16, 4]
max_points_per_voxel: 32
max_number_of_voxels: {
'train': 16000,
'test': 10000
}
num_workers: 4
model:
name: PointPillar
pretrained_model_path: null
vfe:
name: PillarVFE
with_distance: False
use_absolue_xyz: True
use_norm: True
num_filters: [64]
map_to_bev:
name: PointPillarScatter
num_bev_features: 64
backbone_2d:
name: BaseBEVBackbone
layer_nums: [3, 5, 5]
layer_strides: [2, 2, 2]
num_filters: [64, 128, 256]
upsample_strides: [1, 2, 4]
num_upsample_filters: [128, 128, 128]
dense_head:
name: AnchorHeadSingle
class_agnostic: False
use_direction_classifier: True
dir_offset: 0.78539
dir_limit_offset: 0.0
num_dir_bins: 2
anchor_generator_config: [
{
'class_name': 'Car',
'anchor_sizes': [[3.9, 1.6, 1.56]],
'anchor_rotations': [0, 1.57],
'anchor_bottom_heights': [-1.78],
'align_center': False,
'feature_map_stride': 2,
'matched_threshold': 0.6,
'unmatched_threshold': 0.45
},
{
'class_name': 'Pedestrian',
'anchor_sizes': [[0.8, 0.6, 1.73]],
'anchor_rotations': [0, 1.57],
'anchor_bottom_heights': [-0.6],
'align_center': False,
'feature_map_stride': 2,
'matched_threshold': 0.5,
'unmatched_threshold': 0.35
},
{
'class_name': 'Cyclist',
'anchor_sizes': [[1.76, 0.6, 1.73]],
'anchor_rotations': [0, 1.57],
'anchor_bottom_heights': [-0.6],
'align_center': False,
'feature_map_stride': 2,
'matched_threshold': 0.5,
'unmatched_threshold': 0.35
}
]
target_assigner_config:
name: AxisAlignedTargetAssigner
pos_fraction: -1.0
sample_size: 512
norm_by_num_examples: False
match_height: False
box_coder: ResidualCoder
loss_config:
loss_weights: {
'cls_weight': 1.0,
'loc_weight': 2.0,
'dir_weight': 0.2,
'code_weights': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
}
post_processing:
recall_thresh_list: [0.3, 0.5, 0.7]
score_thresh: 0.1
output_raw_score: False
eval_metric: kitti
nms_config:
multi_classes_nms: False
nms_type: nms_gpu
nms_thresh: 0.01
nms_pre_max_size: 4096
nms_post_max_size: 500
sync_bn: False
train:
batch_size: 4
num_epochs: 80
optimizer: adam_onecycle
lr: 0.003
weight_decay: 0.01
momentum: 0.9
moms: [0.95, 0.85]
pct_start: 0.4
div_factor: 10
decay_step_list: [35, 45]
lr_decay: 0.1
lr_clip: 0.0000001
lr_warmup: False
warmup_epoch: 1
grad_norm_clip: 10
resume_training_checkpoint_path: null
pruned_model_path: "/path/to/pointpillar_workspace/33/pruned_0.5.tlt"
tcp_port: 18888
random_seed: null
checkpoint_interval: 1
max_checkpoint_save_num: 30
merge_all_iters_to_one_epoch: False
evaluate:
batch_size: 1
checkpoint: "/path/to/pointpillar_workspace/33/ckpt/checkpoint_epoch_80.tlt"
inference:
max_points_num: 25000
batch_size: 1
checkpoint: "/path/to/pointpillar_workspace/33/ckpt/checkpoint_epoch_80.tlt"
viz_conf_thresh: 0.1
export:
gpu_id: 0
checkpoint: "/path/to/tao-experiments/ckpt/checkpoint_epoch_80.tlt"
onnx_file: "/path/to/tao-experiments/ckpt/checkpoint_epoch_80.tlt.onnx"
prune:
model: "/path/to/tlt-experiments/ckpt/checkpoint_epoch_80.tlt"
The top level description of the spec file is provided in the table below.
Parameter | Data Type | Default | Description |
class_names |
list of strings | – | The list of class names in dataset |
data_path |
string | – | The path to the dataset |
data_split |
dict | – | The dict that maps train and test splits to actual directory name |
info_path |
dict | – | The dict that maps train and test splits to actual pickle info name |
balanced_resampling |
bool | False | Whether or not to enable balanced resampling in data loader |
point_feature_encoding |
Collection | – | The configuration for point feature encoding |
point_feature_encoding |
Collection | – | The configuration for point feature encoding |
point_cloud_range |
list of floats | – | The point cloud coordinates range in [xmin, ymin, zmin, xmax, ymax, zmax] format |
data_augmentor |
Collection | – | The configuration for data augmentation |
data_processor |
Collection | – | The configuration for data processing |
num_workers |
int | 1 | The number of workers used for data loader |
Class Names
The class_names
parameter provides the list of object class names in the dataset. It is simply
a list of strings.
Dataset
The dataset
parameter defines the dataset for training and validation/evaluation of the
PointPillars model, described below.
Parameter | Data Type | Default | Description |
dataset |
Collection | – | The configuration of the dataset |
model |
Collection | – | The configuration of the PointPillars model |
train |
Collection | – | The configuration of the training process |
inference |
Collection | – | The configuration for the inference process |
evaluate |
Collection | – | The configuration for the evaluation process |
export |
Collection | – | The configuration for exporting the model |
prune |
Collection | – | The configuration for pruning the model |
Point Feature Encoding
Point feature encoding defines how the features of each point are represented. This parameter is fixed for this version and has to be:
{
encoding_type: absolute_coordinates_encoding,
used_feature_list: ['x', 'y', 'z', 'intensity'],
src_feature_list: ['x', 'y', 'z', 'intensity'],
}
Data Augmentations
Data augmentation pipelines are defined by the parameter data_augmentor
. See table below.
Parameter | Data Type | Default | Description |
disable_aug_list |
List of strings | ["placeholder"] |
The list of augmentations to be disabled |
aug_config_list |
List of collections | – | The list of augmentations, whose name should be gt_sampling, random_world_flip, random_world_rotation, random_world_scaling , in that order |
The parameters for gt_sampling
is provided below.
Parameter | Data Type | Default | Description |
name |
string | gt_sampling |
The name, has to be gt_sampling |
db_info_path |
List of strings | dbinfos_train.pkl |
The list of db infos for sampling |
preface |
dict | – | Preface of the gt sampling |
sample_groups |
List of strings | – | list of strings to provide per-class sample groups |
num_point_features |
int | 4 | Number of features for each point |
disable_with_fake_lidar |
bool | False | Whether the fake LIDAR is enabled |
remove_extra_width |
list of floats | – | Extra widths to remove per-class |
limit_whole_scene |
bool | False | Whether or not to limit whole scene |
The parameters for random_world_flip
are described below.
Parameter | Data Type | Default | Description |
along_axis_list |
List of string | – | The axes along which to flip the coordinates |
The parameters for random_world_rotation
are described below.
Parameter | Data Type | Default | Description |
world_rot_angle |
List of floats | – | The maximum angles to rotate |
The parameters for random_world_scaling
are described below.
Parameter | Data Type | Default | Description |
world_scale_range |
List of floats | – | The minimum and maximum scaling factors |
Data Processing
The dataset processing is defined by the DATA_PROCESSOR
parameter.
Parameter | Data Type | Default | Description |
data_processor |
List of collections | – | The list of data processing, should include mask_points_and_boxes_outside_range, shuffle_points, transform_points_to_voxels , in that order |
The parameters for mask_points_and_boxes_outside_range
are described below.
Parameter | Data Type | Default | Description |
name |
string | mask_points_and_boxes_outside_range |
The name, has to be mask_points_and_boxes_outside_range |
remove_outside_boxes |
bool | True | Whether or not to remove outside boxes |
The parameters for shuffle_points
are described below.
Parameter | Data Type | Default | Description |
name |
string | shuffle_points |
The name, has to be shuffle_points |
shuffle_enabled |
dict | {'train': True, 'test': False} |
Dict to enable/disable shuffling for train/val datasets |
The parameters for transform_points_to_voxels
are described below.
Parameter | Data Type | Default | Description |
name |
string | transform_points_to_voxels |
The name, has to be transform_points_to_voxels |
voxel_size |
List of floats | – | Voxel size in the format [dx, dy, dz] |
max_points_per_voxel |
int | 32 | Maximum number of points per voxel |
max_number_of_voxels |
dict | – | Dict that provides the maximum number of voxels in training and test/validation mode |
Model Architecture
The PointPillars model architecture is defines in the parameter model
, detailed in table below.
Parameter | Data Type | Default | Description |
name |
string | PointPillar |
The name, has to be PointPillar |
vfe |
Collection | – | Definition of the voxel feature extractor |
map_to_bev |
Collection | – | Definition of the scatter module |
backbone_2d |
Collection | – | Definition of the 2D backbone |
dense_head |
Collection | – | Definition of the dense head |
post_processing |
Collection | – | Post-processing |
sync_bn |
bool | False | Enable sync-BN or not |
Voxel Feature extractor
The voxel feature extractor is configured by the parameter vfe
, described below.
Parameter | Data Type | Default | Description |
name |
string | PillarVFE |
The name, has to be PillarVFE |
with_distance |
bool | False | With distance or not |
use_absolue_xyz |
bool | True | Use absolute XYZ coordinates or not |
use_norm |
bool | True | Use normalization or not |
num_filters |
List of int | 64 | Number of filters |
Scatter
The scattering process is configured by the parameter map_to_bev
, described below.
Parameter | Data Type | Default | Description |
name |
string | PointPillarScatter |
The name, has to be PointPillarScatter |
num_bev_features |
int | 64 | Number of features for bird’s eye view |
2D backbone
The 2D backbone is configured by the parameter backbone_2d
, described below.
Parameter | Data Type | Default | Description |
name |
string | BaseBEVBackbone |
The name, has to be BaseBEVBackbone |
layer_nums |
List of ints | [3, 5, 5] | Numbers of layers |
layer_strides |
List of ints | [2, 2, 2] | The number of strides |
num_filters |
List of ints | [64, 128, 256] | The numbers of filters |
upsample_strides |
List of ints | [1, 2, 4] | The upsampling strides |
num_upsample_filters |
List of ints | [128, 128, 128] | The numbers of upsampling filters |
Dense Head
The dense head is configured by the parameter dense_head
, described below.
Parameter | Data Type | Default | Description |
name |
string | AnchorHeadSingle |
The name, has to be AnchorHeadSingle |
class_agnostic |
bool | False | Class agnostic or not |
use_direction_classifier |
bool | True | Use direction classifier or not |
dir_offset |
float | 0.78539 | Direction offset |
dir_limit_offset |
float | 0.0 | Direction limit offset |
num_dir_bins |
int | 2 | The numbers of direction bins |
anchor_generator_config |
List of dict | – | The config for per-class anchor generator |
target_assigner_config |
Collection | – | Config for target assigner |
loss_config |
Collection | – | Config for loss function |
The parameters of anchor_generator_config
is a list of dicts. Each dict follows the same format,
described below.
{
'class_name': 'Car',
'anchor_sizes': [[3.9, 1.6, 1.56]],
'anchor_rotations': [0, 1.57],
'anchor_bottom_heights': [-1.78],
'align_center': False,
'feature_map_stride': 2,
'matched_threshold': 0.6,
'unmatched_threshold': 0.45
}
The parameters of target_assigner_config
are described below.
Parameter | Data Type | Default | Description |
name |
string | AxisAlignedTargetAssigner |
The name, has to be AxisAlignedTargetAssigner |
pos_fraction |
float | -1.0 | Positive fraction |
sample_size |
int | 512 | Sample size |
norm_by_num_examples |
bool | False | Normalize by number of examples or not |
match_height |
bool | False | Match height or not |
box_coder |
string | ResidualCoder | The name of the box coder |
The parameters for loss_config
are described below.
Parameter | Data Type | Default | Description |
loss_weights |
dict | – | The dict to provide loss weighting factors |
The loss_weights
dict should be in the format below.
{
'cls_weight': 1.0,
'loc_weight': 2.0,
'dir_weight': 0.2,
'code_weights': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
}
Post Processing
The post-processing is defined in the parameter post_processing
, described below.
Parameter | Data Type | Default | Description |
recall_thresh_list |
List of floats | – | The dict to provide loss weighting factors |
score_thresh |
float | 0.1 | The score threshold |
output_raw_score |
bool | False | Output raw score or not |
eval_metric |
string | kitti |
The evaluation metric, only kitti is supported |
nms_config |
Collection | – | The NMS config |
The Non-Maximum Suppression(NMS) is configured by the nms_config
parameter, described below.
Parameter | Data Type | Default | Description |
multi_classes_nms |
bool | False | Multi-class NMS or not |
nms_type |
string | nms_gpu |
The NMS type |
nms_thresh |
float | 0.01 | The NMS IoU threshold |
nms_pre_maxsize |
int | – | Pre-NMS maximum number of boxes |
nms_post_maxsize |
int | – | Post-NMS maximum number of boxes |
Training Process
The train
parameter defines the hyper-parameters of the training process.
Parameter | Datatype | Default | Description | Supported Values |
batch_size_per_gpu |
int | 4 | The batch size per GPU | >=1 |
num_epochs |
int | 80 | The number of epochs to train the model | >=1 |
optimizer |
string | adam_onecycle |
The optimizer name(type) | adam_onecycle |
lr |
float | 0.003 | The initial learning rate | >0.0 |
weight_decay |
float | 0.01 | Weight decay | >0.0 |
momentum |
float | 0.9 | Momentum for SGD optimizer | >0, <1 |
moms |
List of floats | [0.95, 0.85] | Momentums for One Cycle learning rate scheduler | [0.95, 0.85] |
pct_start |
float | 0.4 | The percentage of the cycle spent increasing the learning rate | 0.4 |
div_factor |
float | 10.0 | Division factor | 10.0 |
decay_step_list |
list of ints | [35, 45] | The list of epoch number on which to decay learning rate | list whose elements < NUM_EPOCHS |
lr_decay |
float | 0.1 | The decay of learning rate | >0, <1 |
lr_clip |
float | 0.0000001 | Minimum value of learning rate | >0, <1 |
lr_warmup |
bool | False | Enable learning rate warm up or not | True/False |
warmup_epoch |
int | 1 | Number of epochs to warm up learning rate | >=1 |
grad_norm_clip |
float | 10.0 | The limit to apply gradient norm clip | >0 |
resume_model_path |
string | – | The path of model to resume training | Unix path |
pretrained_model_path |
string | – | The path to the pretrained model | Unix path |
pruned_model_path |
string | – | The path to the pruned model for retrain | Unix path |
tcp_port |
int | 18888 | TCP port for multi-gpu training | 18888 |
random_seed |
int | – | Random seed | integer |
checkpoint_interval |
int | 1 | Interval of epochs to save checkpoints | >=1 |
max_checkpoint_save_num |
int | 1 | The maximum number of checkpoints to save | >=1 |
merge_all_iters_to_one_epoch |
bool | False | Merge all training steps in one epoch or not | False |
Evaluation
The evaluation
parameter defines the hyper-parameters of the evaluation process. The metric
of evaluation is mAP(3D and BEV).
Parameter | Datatype | Default/Suggested value | Description | Supported Values |
batch_size |
int | 1 | The batch size of evaluation | >=1 |
checkpoint |
string | – | The path to the model to run evaluation | Unix path |
Inference
The inference
parameter defines the hyper-parameters of the inference process. Inference will
draw bounding boxes and visualize it on images.
Parameter | Datatype | Default/Suggested value | Description | Supported Values |
batch_size |
int | 1 | The batch size of inference | >=1 |
checkpoint |
string | – | The path to the model to run inference | Unix path |
max_points_num |
int | – | Maximum number of points in a point cloud file | >=1 |
vis_conf_thresh |
float | 0.1 | Visualization confidence threshold | >0, <1 |
Export
The export
parameter defines the hyper-parameters of the export process.
Parameter | Datatype | Default/Suggested value | Description | Supported Values |
gpu_id |
int | 0 | The index of the GPU to be used | >=0 |
checkpoint |
string | – | The path to the model to run export | Unix path |
onnx_file |
string | – | The output path to the exported model | Unix path |
data_type |
string | ‘fp32’ | Data type of TensorRT engine | ‘fp32’, ‘fp16’ |
batch_size |
int | 1 | Batch size to export | >=1 |
workspace_size |
int | 1024 | Workspace size in MB for building TensorRT engine | >=1 |
save_engine |
string | – | Path to TensorRT engine to save to | Unix path |
Prune
The prune
parameter defines the hyper-parameters of the pruning process.
Parameter | Datatype | Default/Suggested value | Description | Supported Values |
model |
string | – | The path to the model to be pruned | Unix path |
Use the following command to run PointPillars training:
tao model pointpillars train -e <experiment_spec_file>
[-h, --help]
[results_dir=<global_results_dir>]
[model.<model_option>=<model_option_value>]
[dataset.<dataset_option>=<dataset_option_value>]
[train.<train_option>=<train_option_value>]
[train.gpu_ids=<gpu indices>]
[train.num_gpus=<number of gpus>]
Required Arguments
-e, --experiment_spec_file
: The path to the experiment spec file
Optional Arguments
-h, --help
: show this help message and exit.model.<model_option>
: the model options.dataset.<dataset_option>
: the dataset options.train.<train_option>
: the train options.
For training, evaluation, and inference, we expose 2 variables for each respective task: num_gpus
and gpu_ids
, which
default to 1
and [0]
, respectively. If both are passed, but inconsistent, for example num_gpus = 1
,
gpu_ids = [0, 1]
, then they are modified to follow the setting with more GPUs, for example num_gpus = 1 -> num_gpus = 2
.
The evaluation metric of PointPillars is mAP(BEV and 3D).
Use the following command to run PointPillars evaluation:
tao model pointpillars evaluate -e <experiment_spec_file>
evaluate.checkpoint=<model to be evaluated>
results_dir=<results_dir>
[-h, --help]
Required Arguments
-e, --experiment_spec_file
: Experiment spec file to set up the evaluation experiment. This should be the same as a training spec file.results_dir
: The path to a folder where the experiment outputs should be written.evaluate.checkpoint
: The.pth
model to be evaluated.
Optional Arguments
-h, --help
: Show this help message and exit.evaluate.<evaluate_option>
: The evaluate options.
The evaluation metric in TAO PointPillars is different from that in official metric of KITTI point cloud detection. While KITTI metric considers easy/moderate/hard categories of objects and filters small objects whose sizes are smaller than a threshold, it is only meaningful for KITTI dataset. Instead, TAO PointPillars metric is a general metric that does not classify objects into easy/moderate/hard categories and does not exclude objects in calculation of metric. This makes TAO PointPillars metric a general metric that is applicable to a general dataset. The final result is average precision(AP) and mean average precision(mAP) regardless of its details in computation. Due to this, the TAO PointPillars metric is not comparable with KITTI official metric on KITTI dataset, although they should be roughly the same.
Use the following command to run inference on PointPillars with .tlt
model or TensorRT engine:
tao model pointpillars inference -e <experiment_spec_file>
results_dir=<results_dir>
inference.checkpoint=<inference model>
[-h, --help]
Required Arguments
-e, --experiment_spec_file
: Experiment spec file to set up the inference experiment. This should be the same as a training spec file.results_dir
: The path to a folder where the experiment outputs should be written.inference.checkpoint
: The.pth
model to run inference on.
Optional Arguments
-h, --help
: Show this help message and exit.inference.<inference_option>
: The inference options.
TAO PointPillars models supports model pruning. Model pruning reduces model parameters and hence can improve inference frame per second(FPS) on NVIDIA GPUs while maintaining (almost) the same accuracy(mAP).
Pruning is applied to an already trained PointPillars model. The pruning will output a new model with fewer number of parameters in it. Once we have the pruned model, it is necessary to do finetune on the same dataset to bring back the accuracy(mAP). Finetune is simply running training again but with the pruned model as its pretrained model.
Use the following command to run pruning on the PointPillars .tlt
model.
tao model pointpillars prune -e <experiment_spec_file>
results_dir=<results_dir>
prune.model=<path_to_tlt_model_to_prune>
[prune.pruning_thresh=<pruning_threshold>]
Required Arguments
-e, --experiment_spec_file
: Experiment spec file to set up the inference experiment. This should be the same as a training spec file.results_dir
: The path to a folder where the experiment outputs should be written.prune.model
: The path to the model to prune.
Optional Arguments
prune.pruning_thresh
: Pruning threshold, should be a float number between 0-1. Defaults to 0.1.
After pruning, the pruned model can be used for retrain(finetune). To start the retrain, we simply provide
the path to the pruned model in config file as the parameter OPTIMIZATION.PRUNED_MODEL_PATH
and
then start the training command as mentioned above.
Use the following command to export PointPillars to .onnx
format for deployment:
tao model pointpillars export -m <model>
-e <experiment_spec>
export.checkpoint=<model to export>
export.onnx_file=<output_file>
[export.<export_option>=<export_option_value>]
[-h, --help]
Required Arguments
-e, --experiment_spec
: The path to an experiment spec fileexport.checkpoint
: The.pth
model to export.export.onnx_file
: The path where the.etlt
or.onnx
model is saved.
Optional Arguments
-h, --help
: Show this help message and exit.export.<export_option>
: The export options.
You can use the following optional arguments to save the TRT engine that is generated to verify export:
export.save_engine
: The path to the serialized TensorRT engine file. Note that this file is hardware specific and cannot be generalized across GPUs. Useful to quickly test your model accuracy using TensorRT on the host. As the TensorRT engine file is hardware specific, you cannot use this engine file for deployment unless the deployment GPU is identical to the training GPU.
The PointPillars models that you trained can be deployed on edge devices, such as a Jetson Xavier, Jetson Nano, or Tesla, or in the cloud with NVIDIA GPUs.
DeepStream SDK is currently does not support deployment of PointPillars models. Instead, the PointPillars models can only be deployed via a standalone TensorRT application. A TensorRT sample is developed as a demo to show how to deploy PointPillars models trained in TAO.
Using trtexec
For instructions on generating a TensorRT engine using the trtexec
command, refer to the
trtexec guide for ReIdentificationNet.