BEVFusion#
BEVFusion is an 3D object-detection model that is included in the TAO. It supports the following tasks:
convert
train
evaluate
inference
The sample inference result:
Each of these tasks are explained as follows.
Dataset Format#
The dataset for BEVFusion contains point cloud data, rgb image and the corresponding annotations of 3D objects. The directory structure should be organized as KITTI directory structure.
/kitti
/training
/calib
000000.txt
000001.txt
...
N.txt
/image_2
000000.png
000001.png
...
N.png
/label_2
000000.txt
000001.txt
...
N.txt
/velodyne
000000.bin
000001.bin
...
N.bin
/ImageSets
train.txt
val.txt
test.txt
Each
.bin file should comply with the format described above. Each
.txt label file
should comply to the KITTI format.
Creating a Configuration File#
Below is a sample BEVFusion spec file. It has six components -
model,
inference,
evaluate,
dataset and
train-as well as several global parameters,
which are described below. The format of the spec file is a YAML file.
Here’s a sample of the BEVFusion spec file:
Use the following command to get an experiment spec file for BEVFusion:
BASE_EXPERIMENT_ID=$(tao bevfusion list-base-experiments | jq -r '.[0].id')
SPECS=$(tao bevfusion get-job-schema --action train --base-experiment-id $BASE_EXPERIMENT_ID | jq -r '.default')
results_dir: /results/bevfusion
dataset:
type: KittiPersonDataset
root_dir: /data/
gt_box_type: camera
default_cam_key: CAM2
train_dataset:
repeat_time: 2
ann_file: /data/kitti_person_infos_train.pkl
data_prefix:
pts: training/velodyne_reduced
img: training/image_2
batch_size: 4
num_workers: 8
val_dataset:
ann_file: /data/kitti_person_infos_val.pkl
data_prefix:
pts: training/velodyne_reduced
img: training/image_2
batch_size: 2
num_workers: 4
test_dataset:
ann_file: /data/kitti_person_infos_val.pkl
data_prefix:
pts: training/velodyne_reduced
img: training/image_2
batch_size: 4
num_workers: 4
model:
type: BEVFusion
point_cloud_range: [0, -40, -3, 70.4, 40, 1]
voxel_size: [0.05, 0.05, 0.1]
grid_size: [1440, 1440, 41]
train:
num_gpus: 1
num_nodes: 1
validation_interval: 1
num_epochs: 5
optimizer:
type: AdamW
lr: 0.0002
lr_scheduler:
- type: LinearLR
start_factor: 0.33333333
by_epoch: False
begin: 0
end: 500
- type: CosineAnnealingLR
T_max: 10
begin: 0
end: 10
by_epoch: True
eta_min_ratio: 1e-4
- type: CosineAnnealingMomentum
eta_min: 0.8947
begin: 0
end: 2.4
by_epoch: True
- type: CosineAnnealingMomentum
eta_min: 1
begin: 2.4
end: 10
by_epoch: True
inference:
num_gpus: 1
conf_threshold: 0.3
checkpoint: /results/train/bevfusion_model.pth
evaluate:
num_gpus: 1
checkpoint: /results/train/bevfusion_model.pth
|
Field
|
value_type
|
description
|
default_value
|
valid_min
|
valid_max
|
valid_options
|
automl_enabled
|
|
string
|
/results
|
FALSE
|
|
string
|
Default scope to use mmdet3d
|
mmdet3d
|
FALSE
|
|
collection
|
Default hooks for mmlabs
|
{‘timer’: {‘type’: ‘IterTimerHook’}, ‘logger’: {‘type’: ‘LoggerHook’, ‘interval’: 1, ‘log_metric_by_epoch’: True}, ‘param_scheduler’: {‘type’: ‘ParamSchedulerHook’}, ‘checkpoint’: {‘type’: ‘CheckpointHook’, ‘by_epoch’: True, ‘interval’: 1}, ‘sampler_seed’: {‘type’: ‘DistSamplerSeedHook’}, ‘visualization’: {‘type’: ‘Det3DVisualizationHook’}}
|
FALSE
|
|
string
|
Default logger hook type
|
TAOBEVFusionLoggerHook
|
FALSE
|
|
int
|
Optional manual seed. Seed is set when the value is given in spec file.
|
FALSE
|
|
collection
|
Input modality for the model. Set True for each modality to use.
|
{‘use_lidar’: True, ‘use_camera’: True, ‘use_radar’: False, ‘use_map’: False, ‘use_external’: False}
|
FALSE
|
|
collection
|
Configurable parameters to construct the model for a BEVFusion experiment.
|
FALSE
|
|
collection
|
Configurable parameters to construct the dataset for a BEVFusion experiment.
|
FALSE
|
|
collection
|
Configurable parameters to construct the trainer for a BEVFusion experiment.
|
FALSE
|
|
collection
|
Configurable parameters to construct the evaluator for a BEVFusion experiment.
|
FALSE
|
|
collection
|
Configurable parameters to construct the inferencer for a BEVFusion experiment.
|
FALSE
Data Preprocessor Config#
The dataset configuration (
data_preprocessor) defines the data source and pre-processing hyperparameters.
|
Field
|
value_type
|
description
|
default_value
|
valid_min
|
valid_max
|
valid_options
|
automl_enabled
|
|
string
|
Name of Data Pre-processor for 3D Fusion
|
Det3DDataPreprocessor
|
FALSE
|
|
list
|
The input mean for RGB frames
|
[123.675, 116.28, 103.53]
|
FALSE
|
|
list
|
The input standard deviation per pixel for RGB frames
|
[58.395, 57.12, 57.375]
|
FALSE
|
|
bool
|
whether to convert image from BGR to RGB.
|
32
|
FALSE
|
|
int
|
The size of padded image should be divisible.
|
32
|
FALSE
|
|
collection
|
{‘max_num_points’: 10, ‘max_voxels’: [120000, 160000], ‘voxelize_reduce’: True}
|
FALSE
Dataset Config#
The dataset configuration (
dataset) defines the dataset directories, annotation file and batch size for either
train,
val or
test.
|
Field
|
value_type
|
description
|
default_value
|
valid_min
|
valid_max
|
valid_options
|
automl_enabled
|
|
string
|
Dataset types for 3D Fusion
|
KittiPersonDataset
|
TAO3DSyntheticDataset,TAO3DDataset,KittiPersonDataset
|
FALSE
|
|
string
|
A path to the root directory of the given dataset
|
/data/
|
FALSE
|
|
list
|
A List of the classes to be trained.
|
[‘person’]
|
FALSE
|
|
string
|
3D bounding boxes type to be used when training.
|
lidar
|
lidar,camera
|
FALSE
|
|
string
|
3D bounding boxes type in ground truth.
|
camera
|
lidar,camera
|
FALSE
|
|
list
|
The origin of the given center point in ground truth 3D bounding boxes.
|
[0.5, 1.0, 0.5]
|
FALSE
|
|
string
|
Default camera name in dataset
|
CAM0
|
FALSE
|
|
bool
|
Whether to save results in per sequence format.
|
False
|
FALSE
|
|
int
|
Number of camera view in dataset.
|
1
|
FALSE
|
|
int
|
Input lidar point cloud data dimension
|
4
|
FALSE
|
|
collection
|
Configurable parameters to construct the train dataset.
|
FALSE
|
|
collection
|
Configurable parameters to construct the validation dataset.
|
FALSE
|
|
collection
|
Configurable parameters to construct the test dataset.
|
FALSE
|
|
string
|
Image file for single file inference
|
FALSE
|
|
string
|
Point cloud file for single file inference
|
FALSE
|
|
list
|
Camera instrinsic matrix for single file inference
|
FALSE
|
|
list
|
Lidar to camera extrinsic matrix for single file inference
|
FALSE
Model Config#
The model configuration (
model) defines the BEVFusion model structure. This model
is used for training, evaluation, and inference. A detailed description is included in the
table below.
|
Field
|
value_type
|
description
|
default_value
|
valid_min
|
valid_max
|
valid_options
|
automl_enabled
|
|
string
|
Model name
|
BEVFusion
|
BEVFusion
|
FALSE
|
|
list
|
point cloud range
|
[0, -40, -3, 70.4, 40, 1]
|
FALSE
|
|
list
|
voxel size in voxelization
|
[0.05, 0.05, 0.1]
|
FALSE
|
|
list
|
post processing center filter range
|
[-61.2, -61.2, -20.0, 61.2, 61.2, 20.0]
|
FALSE
|
|
list
|
Grid size for bevfusion model
|
[1440, 1440, 41]
|
FALSE
|
|
collection
|
Configurable parameters to construct the preprocessor for the bevfusion model.
|
FALSE
|
|
collection
|
Configurable parameters to construct the camera image backbone for the bevfusion model.
|
FALSE
|
|
collection
|
Configurable parameters to construct the camera image neck for the bevfusion model.
|
FALSE
|
|
collection
|
Configurable parameters to construct the camera view transform for the bevfusion model.
|
FALSE
|
|
collection
|
Configurable parameters to construct the lidar point cloud backbone for the bevfusion model.
|
FALSE
|
|
collection
|
Configurable parameters to construct the lidar point cloud voxel encoder for the bevfusion model.
|
{‘type’: ‘HardSimpleVFE’, ‘num_features’: 4}
|
FALSE
|
|
collection
|
Configurable parameters to construct the lidar encoder for the bevfusion model.
|
FALSE
|
|
collection
|
Configurable parameters to construct the lidar neck for the bevfusion model.
|
FALSE
|
|
collection
|
Configurable parameters to construct the fusion layer for the bevfusion model.
|
FALSE
|
|
collection
|
Configurable parameters to construct the bounding box head for the bevfusion model.
|
FALSE
Image Backbone Config#
The backbone configuration (
img_backbone) defines the backbone structure. A detailed description is included in the
table below. Currently, BEVFusion only supports Swin-Transformers and ResNet50 image backbone.
|
Field
|
value_type
|
description
|
default_value
|
valid_min
|
valid_max
|
valid_options
|
automl_enabled
|
|
string
|
Name of Image Backbone for 3D Fusion
|
mmdet.SwinTransformer
|
FALSE
|
|
int
|
Number of input channels.
|
96
|
FALSE
|
|
list
|
Depths of each Swin Transformer stage.
|
[2, 2, 6, 2]
|
FALSE
|
|
list
|
Number of attention head of each stage.
|
[3, 6, 12, 24]
|
FALSE
|
|
int
|
Window size for Swin Transformer.
|
7
|
FALSE
|
|
int
|
Ratio of mlp hidden dim to embedding dim.
|
4
|
FALSE
|
|
bool
|
If True, add a learnable bias to query, key, value.
|
True
|
FALSE
|
|
string
|
Override default qk scale of head_dim ** -0.5 if set.
|
FALSE
|
|
float
|
Dropout rate.
|
0.0
|
FALSE
|
|
float
|
Attention dropout rate.
|
0.0
|
FALSE
|
|
float
|
Stochastic drop rate
|
0.2
|
FALSE
|
|
bool
|
If True, add normalization after patch embedding.
|
True
|
FALSE
|
|
list
|
Output from which stages.
|
[1, 2, 3]
|
FALSE
|
|
bool
|
Use checkpoint or not. Using checkpoint
will save some memory while slowing down the training speed.
|
False
|
FALSE
|
|
bool
|
The flag indicates whether the
pre-trained model is from the original repo.
|
True
|
FALSE
|
|
collection
|
Configuration for initialzation.
|
FALSE
Image Neck Config#
The neck configuration (
img_neck) defines the image neck structure. A detailed description is included in the
table below. Currently, BEVFusion only supports GeneralizedLSSFPN image backbone.
|
Field
|
value_type
|
description
|
default_value
|
valid_min
|
valid_max
|
valid_options
|
automl_enabled
|
|
string
|
Image Neck Name
|
GeneralizedLSSFPN
|
FALSE
|
|
list
|
The number of input channels for image neck.
|
[192, 384, 768]
|
FALSE
|
|
int
|
The number of output channels for image neck.
|
256
|
FALSE
|
|
int
|
Starting level for image neck.
|
0
|
FALSE
|
|
int
|
The number of outputput for image neck.
|
0
|
FALSE
|
|
collection
|
The configuration of normalization for image neck.
|
{‘type’: ‘BN2d’, ‘requires_grad’: True}
|
FALSE
|
|
collection
|
The configuration of activation for image neck.
|
{‘type’: ‘ReLU’, ‘inplace’: True}
|
FALSE
|
|
collection
|
The configuration of upsampling for image neck.
|
{‘mode’: ‘bilinear’, ‘align_corners’: False}
|
FALSE
View Transform Config#
The configuration (
view_transform) defines the view transform structure for camera input. A detailed description is included in the
table below. Currently, BEVFusion only supports DepthLSSTransform and LSSTransform image backbone.
|
Field
|
value_type
|
description
|
default_value
|
valid_min
|
valid_max
|
valid_options
|
automl_enabled
|
|
string
|
Image view transform name.
|
DepthLSSTransform
|
DepthLSSTransform,LSSTransform
|
FALSE
|
|
int
|
The number of input channels for view transform.
|
256
|
FALSE
|
|
int
|
The number of output channels for view transform.
|
80
|
FALSE
|
|
list
|
Image size for view transform.
|
[256, 704]
|
FALSE
|
|
list
|
Feature size for view transform.
|
[32, 88]
|
FALSE
|
|
list
|
The grid range for x-axis.
|
[-54.0, 54.0, 0.3]
|
FALSE
|
|
list
|
The grid range for y-axis.
|
[-54.0, 54.0, 0.3]
|
FALSE
|
|
list
|
The grid range for z-axis.
|
[-10.0, 10.0, 20.0]
|
FALSE
|
|
list
|
The grid range for depth.
|
[1.0, 60.0, 0.5]
|
FALSE
|
|
int
|
The ratio for downsampling.
|
2
|
FALSE
Lidar Backbone Config#
The backbone configuration (
lidar_backbone) defines the image backbone structure. A detailed description is included in the
table below. Currently, BEVFusion only supports SECOND lidar backbone at the moment.
|
Field
|
value_type
|
description
|
default_value
|
valid_min
|
valid_max
|
valid_options
|
automl_enabled
|
|
string
|
The lidar backbone name.
|
SECOND
|
FALSE
|
|
int
|
The number of input channels for lidar backbone.
|
256
|
FALSE
|
|
list
|
The number of output channels for lidar backbone.
|
[128, 256]
|
FALSE
|
|
list
|
The number of layer in each stage for lidar backbone.
|
[5, 5]
|
FALSE
|
|
list
|
Number of layers in each stage for lidar backbone.
|
[1, 2]
|
FALSE
|
|
collection
|
The configuration of normalization for lidar backbone.
|
{‘type’: ‘BN’, ‘eps’: 0.001, ‘momentum’: 0.01}
|
FALSE
|
|
collection
|
The configuration of convolution layers for lidar backbone.
|
{‘type’: ‘Conv2d’, ‘bias’: False}
|
FALSE
Lidar Encoder Config#
The encoder configuration (
pts_middle_encoder) defines the lidar encoder structure. A detailed description is included in the
table below. Currently, BEVFusion only supports BEVFusionSparseEncoder structure at the moment.
|
Field
|
value_type
|
description
|
default_value
|
valid_min
|
valid_max
|
valid_options
|
automl_enabled
|
|
string
|
The lidar encoder name.
|
BEVFusionSparseEncoder
|
FALSE
|
|
int
|
The number of input channels for lidar encoder.
|
4
|
FALSE
|
|
list
|
The sparse shape of input tensor.
|
[1440, 1440, 41]
|
FALSE
|
|
list
|
Order of conv module.
|
[‘conv’, ‘norm’, ‘act’]
|
FALSE
|
|
collection
|
The configuration of normalization for lidar encoder.
|
{‘type’: ‘BN1d’, ‘eps’: 0.001, ‘momentum’: 0.01}
|
FALSE
|
|
|
|
string
|
Type of the block to use.
|
basicblock
|
FALSE
Lidar Neck Config#
The configuration (
pts_neck) defines the lidar neck structure. A detailed description is included in the
table below. Currently, BEVFusion only supports SECONDFPN structure at the moment.
|
Field
|
value_type
|
description
|
default_value
|
valid_min
|
valid_max
|
valid_options
|
automl_enabled
|
|
string
|
The lidar neck name.
|
SECONDFPN
|
FALSE
|
|
list
|
The number of input channels for lidar neck.
|
[128, 256]
|
FALSE
|
|
list
|
The number of output channels for lidar neck.
|
[256, 256]
|
FALSE
|
|
list
|
Strides used to upsample the feature map for lidar neck.
|
[1, 2]
|
FALSE
|
|
collection
|
The configuration of normalization for lidar neck.
|
{‘type’: ‘BN’, ‘eps’: 0.001, ‘momentum’: 0.01}
|
FALSE
|
|
collection
|
The configuration of upsample layers for lidar neck.
|
{‘type’: ‘deconv’, ‘bias’: False}
|
FALSE
|
|
bool
|
Whether to use conv when stride is 1.
|
True
|
FALSE
Fusion Layer Config#
The configuration (
fusion_layer) defines the fusion layer structure. A detailed description is included in the
table below. Currently, BEVFusion only supports ConvFuser structure at the moment.
|
Field
|
value_type
|
description
|
default_value
|
valid_min
|
valid_max
|
valid_options
|
automl_enabled
|
|
string
|
The fusion layer name.
|
ConvFuser
|
FALSE
|
|
list
|
The number of input channels for fusion layer.
|
[80, 256]
|
FALSE
|
|
int
|
The number of output channels for fusion layer.
|
256
|
FALSE
BBoxHead Config#
The configuration (
bbox_head) defines the bbox prediction head structure. A detailed description is included in the
table below. Currently, BEVFusion only supports BEVFusionHead structure at the moment.
|
Field
|
value_type
|
description
|
default_value
|
valid_min
|
valid_max
|
valid_options
|
automl_enabled
|
|
string
|
Prediction head name.
|
BEVFusionHead
|
BEVFusionHead
|
FALSE
|
|
int
|
Number of proposals.
|
200
|
FALSE
|
|
bool
|
Whether to enable auxiliary training.
|
True
|
FALSE
|
|
int
|
Number of channels in the input feature map.
|
512
|
FALSE
|
|
int
|
Number of hiden channel.
|
128
|
FALSE
|
|
int
|
Number of classes.
|
1
|
FALSE
|
|
int
|
NMS kernel size.
|
3
|
FALSE
|
|
float
|
Batch Norm momentum.
|
0.1
|
FALSE
|
|
int
|
Number of decoder layer.
|
1
|
FALSE
|
|
int
|
Output size factor.
|
8
|
FALSE
|
|
collection
|
The configuration for bounding box encoder.
|
FALSE
|
|
collection
|
The configuration for decoder layer.
|
FALSE
|
|
list
|
Weights for box encoder.
|
[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
|
FALSE
|
|
string
|
The type of NMS.
|
FALSE
|
|
collection
|
The configuration for assginer.
|
{‘type’: ‘HungarianAssigner3D’, ‘iou_calculator’: {‘type’: ‘BboxOverlaps3D’, ‘coordinate’: ‘lidar’}, ‘cls_cost’: {‘type’: ‘mmdet.FocalLossCost’, ‘gamma’: 2.0, ‘alpha’: 0.25, ‘weight’: 0.15}, ‘reg_cost’: {‘type’: ‘BBoxBEVL1Cost’, ‘weight’: 0.25}, ‘iou_cost’: {‘type’: ‘IoU3DCost’, ‘weight’: 0.25}}
|
FALSE
|
|
collection
|
The configuration for common heads.
|
{‘center’: [2, 2], ‘height’: [1, 2], ‘dim’: [3, 2], ‘rot’: [6, 2]}
|
FALSE
|
|
collection
|
The configuration for classification loss.
|
{‘type’: ‘mmdet.FocalLoss’, ‘use_sigmoid’: True, ‘gamma’: 2.0, ‘alpha’: 0.25, ‘reduction’: ‘mean’, ‘loss_weight’: 1.0}
|
FALSE
|
|
collection
|
The configuration for heatmap loss.
|
{‘type’: ‘mmdet.GaussianFocalLoss’, ‘reduction’: ‘mean’, ‘loss_weight’: 1.0}
|
FALSE
|
|
collection
|
The configuration for bounding box loss.
|
{‘type’: ‘mmdet.L1Loss’, ‘reduction’: ‘mean’, ‘loss_weight’: 0.25}
|
FALSE
Train Config#
The
train configuration defines the hyperparameters of the training process.
BASE_EXPERIMENT_ID=$(tao bevfusion list-base-experiments | jq -r '.[0].id')
SPECS=$(tao bevfusion get-job-schema --action train --base-experiment-id $BASE_EXPERIMENT_ID | jq -r '.default')
train:
precision: 'fp16'
num_gpus: 1
checkpoint_interval: 10
validation_interval: 10
num_epochs: 50
optim:
type: "AdamW"
lr: 0.0001
weight_decay: 0.05
|
Field
|
value_type
|
description
|
default_value
|
valid_min
|
valid_max
|
valid_options
|
automl_enabled
|
|
int
|
The number of GPUs to run the train job.
|
1
|
1
|
FALSE
|
|
list
|
List of GPU IDs to run the training on. The length of this list must be equal to the number of gpus in train.num_gpus.
|
[0]
|
FALSE
|
|
int
|
Number of nodes to run the training on. If > 1, then multi-node is enabled.
|
1
|
FALSE
|
|
int
|
The seed for the initializer in PyTorch. If < 0, disable fixed seed.
|
1234
|
-1
|
inf
|
FALSE
|
|
collection
|
FALSE
|
|
int
|
Number of epochs to run the training.
|
10
|
1
|
inf
|
TRUE
|
|
int
|
The interval (in epochs) at which a checkpoint will be saved. Helps resume training.
|
1
|
1
|
FALSE
|
|
int
|
The interval (in epochs) at which a evaluation will be triggered on the validation dataset.
|
1
|
1
|
FALSE
|
|
string
|
Path to the checkpoint to resume training from.
|
FALSE
|
|
string
|
Path to where all the assets generated from a task are stored.
|
FALSE
|
|
bool
|
Whether EpochBasedRunner is used.
|
True
|
FALSE
|
|
int
|
logging interval every k iterations.
|
1
|
FALSE
|
|
bool
|
Whether to resume the training or not.
|
False
|
FALSE
|
|
string
|
Path to a pre-trained BEVFusion model to initialize the current training from.
|
FALSE
|
|
collection
|
Hyper parameters to configure the optimizer
|
FALSE
|
|
list
|
Hyper parameters to configure the learning rate scheduler.
|
[{‘type’: ‘LinearLR’, ‘start_factor’: 0.33333333, ‘by_epoch’: False, ‘begin’: 0, ‘end’: 500}, {‘type’: ‘CosineAnnealingLR’, ‘T_max’: 10, ‘eta_min_ratio’: 0.0001, ‘begin’: 0, ‘end’: 10, ‘by_epoch’: True}, {‘type’: ‘CosineAnnealingMomentum’, ‘eta_min’: 0.8947, ‘begin’: 0, ‘end’: 2.4, ‘by_epoch’: True}, {‘type’: ‘CosineAnnealingMomentum’, ‘eta_min’: 1, ‘begin’: 2.4, ‘end’: 10, ‘by_epoch’: True}]
|
FALSE
Optimizer config#
The
optim parameter defines the config for the optimizer in training, including the
learning rate, learning scheduler, and weight decay.
|
Field
|
value_type
|
description
|
default_value
|
valid_min
|
valid_max
|
valid_options
|
automl_enabled
|
|
string
|
Type of optimizer used to train the network.
|
AdamW
|
FALSE
|
|
float
|
The initial learning rate for training the model.
|
0.0002
|
FALSE
|
|
float
|
The weight decay coefficient.
|
0.01
|
FALSE
|
|
list
|
The moving average parameter for adaptive learning rate.
|
[0.9, 0.999]
|
FALSE
|
|
collection
|
Clip the gradient norm of an iterable of parameters.
|
{‘max_norm’: 35, ‘norm_type’: 2}
|
FALSE
|
|
string
|
Opitmizer Wrapper in MMengine. AmpOptimWrapper to enables mixed precision training
|
OptimWrapper
|
FALSE
Evaluation Config#
The
evaluate parameter defines the hyperparameters of the evaluation process.
BASE_EXPERIMENT_ID=$(tao bevfusion list-base-experiments | jq -r '.[0].id')
SPECS=$(tao bevfusion get-job-schema --action evaluate --base-experiment-id $BASE_EXPERIMENT_ID | jq -r '.default')
evaluate:
checkpoint: /path/to/model.pth
num_gpus: 1
|
Field
|
value_type
|
description
|
default_value
|
valid_min
|
valid_max
|
valid_options
|
automl_enabled
|
|
int
|
1
|
FALSE
|
|
list
|
[0]
|
FALSE
|
|
int
|
1
|
FALSE
|
|
string
|
???
|
FALSE
|
|
string
|
FALSE
Inference Config#
The
inference parameter defines the hyperparameters of the inference process.
BASE_EXPERIMENT_ID=$(tao bevfusion list-base-experiments | jq -r '.[0].id')
SPECS=$(tao bevfusion get-job-schema --action inference --base-experiment-id $BASE_EXPERIMENT_ID | jq -r '.default')
- inference:
checkpoint: /path/to/model.pth num_gpus: 1
|
Field
|
value_type
|
description
|
default_value
|
valid_min
|
valid_max
|
valid_options
|
automl_enabled
|
|
int
|
1
|
FALSE
|
|
list
|
[0]
|
FALSE
|
|
int
|
1
|
FALSE
|
|
string
|
???
|
FALSE
|
|
string
|
FALSE
|
|
float
|
Confidence Threshold
|
0.5
|
FALSE
|
|
bool
|
Whether to show the 3D visualizaiton on screen
|
False
|
FALSE
Training the Model#
To train a BEVFusion model, use this command:
TRAIN_JOB_ID=$(tao bevfusion create-job \
--kind experiment \
--name "bevfusion_train" \
--action train \
--workspace-id $WORKSPACE_ID \
--specs "$TRAIN_SPECS" \
--train-datasets '["'$DATASET_ID'"]' \
--eval-dataset "$DATASET_ID" \
--base-experiment-ids '["'$BASE_EXPERIMENT_ID'"]' \
--encryption-key "nvidia_tlt" | jq -r '.id')
tao model bevfusion train [-h] -e <experiment_spec> [-r <results_dir>]
Required Arguments
The following arguments are required to run the command.
-e, --experiment_spec: The experiment specification file to set up the training experiment
Optional Arguments
The following arguments are optional to run the command.
-r, --results_dir: The path to the folder where the experiment outputs should be written. If this argument is not specified, the
results_dirfrom the spec file is used.
--gpus: The number of GPUs used to run training
--num_nodes: The number of nodes used to run training. If this value is larger than 1, distributed multi-node training is enabled.
-h, --help: Show this help message and exit.
Sample Usage
Here’s an example of the
train command:
tao bevfusion model train -e /path/to/spec.yaml
Evaluating the Model#
To run evaluation with a BEVFusion model, use this command:
EVAL_JOB_ID=$(tao bevfusion create-job \
--kind experiment \
--name "bevfusion_evaluate" \
--action evaluate \
--workspace-id $WORKSPACE_ID \
--parent-job-id $TRAIN_JOB_ID \
--eval-dataset "$DATASET_ID" \
--specs "$EVALUATE_SPECS" \
--base-experiment-ids '["'$BASE_EXPERIMENT_ID'"]' \
--encryption-key "nvidia_tlt" | jq -r '.id')
tao model bevfusion evaluate [-h] -e <experiment_spec> [-r <results_dir>]
Required Arguments
The following arguments are required.
-e, --experiment_spec: The experiment spec file to set up the evaluation experiment
Optional Arguments
The following arguments are optional to run the command.
-r, --results_dir: The directory where the evaluation result is stored
Sample Usage
Here’s an example of using the
evaluate command:
tao model bevfusion evaluate -e /path/to/spec.yaml -r /path/to/results/ evaluate.checkpoint=/path/to/model.pth
Running Inference with BEVFusion Model#
Use the following command to run inference on BEVFusion with
.pth:
INFER_JOB_ID=$(tao bevfusion create-job \
--kind experiment \
--name "bevfusion_inference" \
--action inference \
--workspace-id $WORKSPACE_ID \
--parent-job-id $TRAIN_JOB_ID \
--inference-dataset "$DATASET_ID" \
--specs "$INFERENCE_SPECS" \
--base-experiment-ids '["'$BASE_EXPERIMENT_ID'"]' \
--encryption-key "nvidia_tlt" | jq -r '.id')
tao model bevfusion inference [-h] -e <experiment spec file> [-r <results_dir>]
Required Arguments
The following arguments are required to run the command.
-e, --experiment_spec: The experiment spec file to set up the inference experiment
Optional Arguments
The following arguments are optional to run the command.
-r, --results_dir: The directory where the inference result is stored
Sample Usage
Here’s an example of using the
inference command:
tao model bevfusion inference -e /path/to/spec.yaml -r /path/to/results/ inference.checkpoint=/path/to/model.pth