BEVFusion#
BEVFusion is an 3D object-detection model that is included in the TAO. It supports the following tasks:
convert
train
evaluate
inference
The sample inferenc result:

TAO BEVFusion Inferece Image#
These tasks may be invoked from the TAO Launcher using the following convention on the command line:
tao model bevfusion <sub_task> <args_per_subtask>
where args_per_subtask
are the command-line arguments required for a given subtask. Each of
these subtasks are explained as follows.
Dataset Format#
The dataset for BEVFusion contains point cloud data, rgb image and the corresponding annotations of 3D objects. The directory structure should be organized as KITTI directory structure.
/kitti
/training
/calib
000000.txt
000001.txt
...
N.txt
/image_2
000000.png
000001.png
...
N.png
/label_2
000000.txt
000001.txt
...
N.txt
/velodyne
000000.bin
000001.bin
...
N.bin
/ImageSets
train.txt
val.txt
test.txt
Each .bin
file should comply with the format described above. Each .txt
label file
should comply to the KITTI format.
Creating a Configuration File#
Below is a sample BEVFusion spec file. It has six components -model
, inference
,
evaluate
, dataset
and train
-as well as several global parameters,
which are described below. The format of the spec file is a YAML file.
Here’s a sample of the BEVFusion spec file:
results_dir: /results/bevfusion
dataset:
type: KittiPersonDataset
root_dir: /data/
gt_box_type: camera
default_cam_key: CAM2
train_dataset:
repeat_time: 2
ann_file: /data/kitti_person_infos_train.pkl
data_prefix:
pts: training/velodyne_reduced
img: training/image_2
batch_size: 4
num_workers: 8
val_dataset:
ann_file: /data/kitti_person_infos_val.pkl
data_prefix:
pts: training/velodyne_reduced
img: training/image_2
batch_size: 2
num_workers: 4
test_dataset:
ann_file: /data/kitti_person_infos_val.pkl
data_prefix:
pts: training/velodyne_reduced
img: training/image_2
batch_size: 4
num_workers: 4
model:
type: BEVFusion
point_cloud_range: [0, -40, -3, 70.4, 40, 1]
voxel_size: [0.05, 0.05, 0.1]
grid_size: [1440, 1440, 41]
train:
num_gpus: 1
num_nodes: 1
validation_interval: 1
num_epochs: 5
optimizer:
type: AdamW
lr: 0.0002
lr_scheduler:
- type: LinearLR
start_factor: 0.33333333
by_epoch: False
begin: 0
end: 500
- type: CosineAnnealingLR
T_max: 10
begin: 0
end: 10
by_epoch: True
eta_min_ratio: 1e-4
- type: CosineAnnealingMomentum
eta_min: 0.8947
begin: 0
end: 2.4
by_epoch: True
- type: CosineAnnealingMomentum
eta_min: 1
begin: 2.4
end: 10
by_epoch: True
inference:
num_gpus: 1
conf_threshold: 0.3
checkpoint: /results/train/bevfusion_model.pth
evaluate:
num_gpus: 1
checkpoint: /results/train/bevfusion_model.pth
Field |
value_type |
description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
|
string |
/results |
FALSE |
||||
|
string |
Default scope to use mmdet3d |
mmdet3d |
FALSE |
|||
|
collection |
Default hooks for mmlabs |
{‘timer’: {‘type’: ‘IterTimerHook’}, ‘logger’: {‘type’: ‘LoggerHook’, ‘interval’: 1, ‘log_metric_by_epoch’: True}, ‘param_scheduler’: {‘type’: ‘ParamSchedulerHook’}, ‘checkpoint’: {‘type’: ‘CheckpointHook’, ‘by_epoch’: True, ‘interval’: 1}, ‘sampler_seed’: {‘type’: ‘DistSamplerSeedHook’}, ‘visualization’: {‘type’: ‘Det3DVisualizationHook’}} |
FALSE |
|||
|
string |
Default logger hook type |
TAOBEVFusionLoggerHook |
FALSE |
|||
|
int |
Optional manual seed. Seed is set when the value is given in spec file. |
FALSE |
||||
|
collection |
Input modality for the model. Set True for each modality to use. |
{‘use_lidar’: True, ‘use_camera’: True, ‘use_radar’: False, ‘use_map’: False, ‘use_external’: False} |
FALSE |
|||
|
collection |
Configurable parameters to construct the model for a BEVFusion experiment. |
FALSE |
||||
|
collection |
Configurable parameters to construct the dataset for a BEVFusion experiment. |
FALSE |
||||
|
collection |
Configurable parameters to construct the trainer for a BEVFusion experiment. |
FALSE |
||||
|
collection |
Configurable parameters to construct the evaluator for a BEVFusion experiment. |
FALSE |
||||
|
collection |
Configurable parameters to construct the inferencer for a BEVFusion experiment. |
FALSE |
Data Preprocessor Config#
The dataset configuration (data_preprocessor
) defines the data source and pre-processing hyperparameters.
Field |
value_type |
description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
|
string |
Name of Data Pre-processor for 3D Fusion |
Det3DDataPreprocessor |
FALSE |
|||
|
list |
The input mean for RGB frames |
[123.675, 116.28, 103.53] |
FALSE |
|||
|
list |
The input standard deviation per pixel for RGB frames |
[58.395, 57.12, 57.375] |
FALSE |
|||
|
bool |
whether to convert image from BGR to RGB. |
32 |
FALSE |
|||
|
int |
The size of padded image should be divisible. |
32 |
FALSE |
|||
|
collection |
{‘max_num_points’: 10, ‘max_voxels’: [120000, 160000], ‘voxelize_reduce’: True} |
FALSE |
Dataset Config#
The dataset configuration (dataset
) defines the dataset directories, annotation file and batch size for either train
, val
or test
.
Field |
value_type |
description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
|
string |
Dataset types for 3D Fusion |
KittiPersonDataset |
TAO3DSyntheticDataset,TAO3DDataset,KittiPersonDataset |
FALSE |
||
|
string |
A path to the root directory of the given dataset |
/data/ |
FALSE |
|||
|
list |
A List of the classes to be trained. |
[‘person’] |
FALSE |
|||
|
string |
3D bounding boxes type to be used when training. |
lidar |
lidar,camera |
FALSE |
||
|
string |
3D bounding boxes type in ground truth. |
camera |
lidar,camera |
FALSE |
||
|
list |
The origin of the given center point in ground truth 3D bounding boxes. |
[0.5, 1.0, 0.5] |
FALSE |
|||
|
string |
Default camera name in dataset |
CAM0 |
FALSE |
|||
|
bool |
Whether to save results in per sequence format. |
False |
FALSE |
|||
|
int |
Number of camera view in dataset. |
1 |
FALSE |
|||
|
int |
Input lidar point cloud data dimension |
4 |
FALSE |
|||
|
collection |
Configurable parameters to construct the train dataset. |
FALSE |
||||
|
collection |
Configurable parameters to construct the validation dataset. |
FALSE |
||||
|
collection |
Configurable parameters to construct the test dataset. |
FALSE |
||||
|
string |
Image file for single file inference |
FALSE |
||||
|
string |
Point cloud file for single file inference |
FALSE |
||||
|
list |
Camera instrinsic matrix for single file inference |
FALSE |
||||
|
list |
Lidar to camera extrinsic matrix for single file inference |
FALSE |
Model Config#
The model configuration (model
) defines the BEVFusion model structure. This model
is used for training, evaluation, and inference. A detailed description is included in the
table below.
Field |
value_type |
description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
|
string |
Model name |
BEVFusion |
BEVFusion |
FALSE |
||
|
list |
point cloud range |
[0, -40, -3, 70.4, 40, 1] |
FALSE |
|||
|
list |
voxel size in voxelization |
[0.05, 0.05, 0.1] |
FALSE |
|||
|
list |
post processing center filter range |
[-61.2, -61.2, -20.0, 61.2, 61.2, 20.0] |
FALSE |
|||
|
list |
Grid size for bevfusion model |
[1440, 1440, 41] |
FALSE |
|||
|
collection |
Configurable parameters to construct the preprocessor for the bevfusion model. |
FALSE |
||||
|
collection |
Configurable parameters to construct the camera image backbone for the bevfusion model. |
FALSE |
||||
|
collection |
Configurable parameters to construct the camera image neck for the bevfusion model. |
FALSE |
||||
|
collection |
Configurable parameters to construct the camera view transform for the bevfusion model. |
FALSE |
||||
|
collection |
Configurable parameters to construct the lidar point cloud backbone for the bevfusion model. |
FALSE |
||||
|
collection |
Configurable parameters to construct the lidar point cloud voxel encoder for the bevfusion model. |
{‘type’: ‘HardSimpleVFE’, ‘num_features’: 4} |
FALSE |
|||
|
collection |
Configurable parameters to construct the lidar encoder for the bevfusion model. |
FALSE |
||||
|
collection |
Configurable parameters to construct the lidar neck for the bevfusion model. |
FALSE |
||||
|
collection |
Configurable parameters to construct the fusion layer for the bevfusion model. |
FALSE |
||||
|
collection |
Configurable parameters to construct the bounding box head for the bevfusion model. |
FALSE |
Image Backbone Config#
The backbone configuration (img_backbone
) defines the backbone structure. A detailed description is included in the
table below. Currently, BEVFusion only supports Swin-Transformers and ResNet50 image backbone.
Field |
value_type |
description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
|
string |
Name of Image Backbone for 3D Fusion |
mmdet.SwinTransformer |
FALSE |
|||
|
int |
Number of input channels. |
96 |
FALSE |
|||
|
list |
Depths of each Swin Transformer stage. |
[2, 2, 6, 2] |
FALSE |
|||
|
list |
Number of attention head of each stage. |
[3, 6, 12, 24] |
FALSE |
|||
|
int |
Window size for Swin Transformer. |
7 |
FALSE |
|||
|
int |
Ratio of mlp hidden dim to embedding dim. |
4 |
FALSE |
|||
|
bool |
If True, add a learnable bias to query, key, value. |
True |
FALSE |
|||
|
string |
Override default qk scale of head_dim ** -0.5 if set. |
FALSE |
||||
|
float |
Dropout rate. |
0.0 |
FALSE |
|||
|
float |
Attention dropout rate. |
0.0 |
FALSE |
|||
|
float |
Stochastic drop rate |
0.2 |
FALSE |
|||
|
bool |
If True, add normalization after patch embedding. |
True |
FALSE |
|||
|
list |
Output from which stages. |
[1, 2, 3] |
FALSE |
|||
|
bool |
Use checkpoint or not. Using checkpoint
will save some memory while slowing down the training speed.
|
False
|
FALSE |
|||
|
bool |
The flag indicates whether the
pre-trained model is from the original repo.
|
True
|
FALSE |
|||
|
collection |
Configuration for initialzation. |
FALSE |
Image Neck Config#
The neck configuration (img_neck
) defines the image neck structure. A detailed description is included in the
table below. Currently, BEVFusion only supports GeneralizedLSSFPN image backbone.
Field |
value_type |
description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
|
string |
Image Neck Name |
GeneralizedLSSFPN |
FALSE |
|||
|
list |
The number of input channels for image neck. |
[192, 384, 768] |
FALSE |
|||
|
int |
The number of output channels for image neck. |
256 |
FALSE |
|||
|
int |
Starting level for image neck. |
0 |
FALSE |
|||
|
int |
The number of outputput for image neck. |
0 |
FALSE |
|||
|
collection |
The configuration of normalization for image neck. |
{‘type’: ‘BN2d’, ‘requires_grad’: True} |
FALSE |
|||
|
collection |
The configuration of activation for image neck. |
{‘type’: ‘ReLU’, ‘inplace’: True} |
FALSE |
|||
|
collection |
The configuration of upsampling for image neck. |
{‘mode’: ‘bilinear’, ‘align_corners’: False} |
FALSE |
View Transform Config#
The configuration (view_transform
) defines the view transform structure for camera input. A detailed description is included in the
table below. Currently, BEVFusion only supports DepthLSSTransform and LSSTransform image backbone.
Field |
value_type |
description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
|
string |
Image view transform name. |
DepthLSSTransform |
DepthLSSTransform,LSSTransform |
FALSE |
||
|
int |
The number of input channels for view transform. |
256 |
FALSE |
|||
|
int |
The number of output channels for view transform. |
80 |
FALSE |
|||
|
list |
Image size for view transform. |
[256, 704] |
FALSE |
|||
|
list |
Feature size for view transform. |
[32, 88] |
FALSE |
|||
|
list |
The grid range for x-axis. |
[-54.0, 54.0, 0.3] |
FALSE |
|||
|
list |
The grid range for y-axis. |
[-54.0, 54.0, 0.3] |
FALSE |
|||
|
list |
The grid range for z-axis. |
[-10.0, 10.0, 20.0] |
FALSE |
|||
|
list |
The grid range for depth. |
[1.0, 60.0, 0.5] |
FALSE |
|||
|
int |
The ratio for downsampling. |
2 |
FALSE |
Lidar Backbone Config#
The backbone configuration (lidar_backbone
) defines the image backbone structure. A detailed description is included in the
table below. Currently, BEVFusion only supports SECOND lidar backbone at the moment.
Field |
value_type |
description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
|
string |
The lidar backbone name. |
SECOND |
FALSE |
|||
|
int |
The number of input channels for lidar backbone. |
256 |
FALSE |
|||
|
list |
The number of output channels for lidar backbone. |
[128, 256] |
FALSE |
|||
|
list |
The number of layer in each stage for lidar backbone. |
[5, 5] |
FALSE |
|||
|
list |
Number of layers in each stage for lidar backbone. |
[1, 2] |
FALSE |
|||
|
collection |
The configuration of normalization for lidar backbone. |
{‘type’: ‘BN’, ‘eps’: 0.001, ‘momentum’: 0.01} |
FALSE |
|||
|
collection |
The configuration of convolution layers for lidar backbone. |
{‘type’: ‘Conv2d’, ‘bias’: False} |
FALSE |
Lidar Encoder Config#
The encoder configuration (pts_middle_encoder
) defines the lidar encoder structure. A detailed description is included in the
table below. Currently, BEVFusion only supports BEVFusionSparseEncoder structure at the moment.
Field |
value_type |
description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
|
string |
The lidar encoder name. |
BEVFusionSparseEncoder |
FALSE |
|||
|
int |
The number of input channels for lidar encoder. |
4 |
FALSE |
|||
|
list |
The sparse shape of input tensor. |
[1440, 1440, 41] |
FALSE |
|||
|
list |
Order of conv module. |
[‘conv’, ‘norm’, ‘act’] |
FALSE |
|||
|
collection |
The configuration of normalization for lidar encoder. |
{‘type’: ‘BN1d’, ‘eps’: 0.001, ‘momentum’: 0.01} |
FALSE |
|||
|
|||||||
|
|||||||
|
string |
Type of the block to use. |
basicblock |
FALSE |
Lidar Neck Config#
The configuration (pts_neck
) defines the lidar neck structure. A detailed description is included in the
table below. Currently, BEVFusion only supports SECONDFPN structure at the moment.
Field |
value_type |
description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
|
string |
The lidar neck name. |
SECONDFPN |
FALSE |
|||
|
list |
The number of input channels for lidar neck. |
[128, 256] |
FALSE |
|||
|
list |
The number of output channels for lidar neck. |
[256, 256] |
FALSE |
|||
|
list |
Strides used to upsample the feature map for lidar neck. |
[1, 2] |
FALSE |
|||
|
collection |
The configuration of normalization for lidar neck. |
{‘type’: ‘BN’, ‘eps’: 0.001, ‘momentum’: 0.01} |
FALSE |
|||
|
collection |
The configuration of upsample layers for lidar neck. |
{‘type’: ‘deconv’, ‘bias’: False} |
FALSE |
|||
|
bool |
Whether to use conv when stride is 1. |
True |
FALSE |
Fusion Layer Config#
The configuration (fusion_layer
) defines the fusion layer structure. A detailed description is included in the
table below. Currently, BEVFusion only supports ConvFuser structure at the moment.
Field |
value_type |
description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
|
string |
The fusion layer name. |
ConvFuser |
FALSE |
|||
|
list |
The number of input channels for fusion layer. |
[80, 256] |
FALSE |
|||
|
int |
The number of output channels for fusion layer. |
256 |
FALSE |
BBoxHead Config#
The configuration (bbox_head
) defines the bbox prediction head structure. A detailed description is included in the
table below. Currently, BEVFusion only supports BEVFusionHead structure at the moment.
Field |
value_type |
description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
|
string |
Prediction head name. |
BEVFusionHead |
BEVFusionHead |
FALSE |
||
|
int |
Number of proposals. |
200 |
FALSE |
|||
|
bool |
Whether to enable auxiliary training. |
True |
FALSE |
|||
|
int |
Number of channels in the input feature map. |
512 |
FALSE |
|||
|
int |
Number of hiden channel. |
128 |
FALSE |
|||
|
int |
Number of classes. |
1 |
FALSE |
|||
|
int |
NMS kernel size. |
3 |
FALSE |
|||
|
float |
Batch Norm momentum. |
0.1 |
FALSE |
|||
|
int |
Number of decoder layer. |
1 |
FALSE |
|||
|
int |
Output size factor. |
8 |
FALSE |
|||
|
collection |
The configuration for bounding box encoder. |
FALSE |
||||
|
collection |
The configuration for decoder layer. |
FALSE |
||||
|
list |
Weights for box encoder. |
[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0] |
FALSE |
|||
|
string |
The type of NMS. |
FALSE |
||||
|
collection |
The configuration for assginer. |
{‘type’: ‘HungarianAssigner3D’, ‘iou_calculator’: {‘type’: ‘BboxOverlaps3D’, ‘coordinate’: ‘lidar’}, ‘cls_cost’: {‘type’: ‘mmdet.FocalLossCost’, ‘gamma’: 2.0, ‘alpha’: 0.25, ‘weight’: 0.15}, ‘reg_cost’: {‘type’: ‘BBoxBEVL1Cost’, ‘weight’: 0.25}, ‘iou_cost’: {‘type’: ‘IoU3DCost’, ‘weight’: 0.25}} |
FALSE |
|||
|
collection |
The configuration for common heads. |
{‘center’: [2, 2], ‘height’: [1, 2], ‘dim’: [3, 2], ‘rot’: [6, 2]} |
FALSE |
|||
|
collection |
The configuration for classification loss. |
{‘type’: ‘mmdet.FocalLoss’, ‘use_sigmoid’: True, ‘gamma’: 2.0, ‘alpha’: 0.25, ‘reduction’: ‘mean’, ‘loss_weight’: 1.0} |
FALSE |
|||
|
collection |
The configuration for heatmap loss. |
{‘type’: ‘mmdet.GaussianFocalLoss’, ‘reduction’: ‘mean’, ‘loss_weight’: 1.0} |
FALSE |
|||
|
collection |
The configuration for bounding box loss. |
{‘type’: ‘mmdet.L1Loss’, ‘reduction’: ‘mean’, ‘loss_weight’: 0.25} |
FALSE |
Train Config#
The train
configuration defines the hyperparameters of the training process.
train:
precision: 'fp16'
num_gpus: 1
checkpoint_interval: 10
validation_interval: 10
num_epochs: 50
optim:
type: "AdamW"
lr: 0.0001
weight_decay: 0.05
Field |
value_type |
description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
|
int |
The number of GPUs to run the train job. |
1 |
1 |
FALSE |
||
|
list |
List of GPU IDs to run the training on. The length of this list must be equal to the number of gpus in train.num_gpus. |
[0] |
FALSE |
|||
|
int |
Number of nodes to run the training on. If > 1, then multi-node is enabled. |
1 |
FALSE |
|||
|
int |
The seed for the initializer in PyTorch. If < 0, disable fixed seed. |
1234 |
-1 |
inf |
FALSE |
|
|
collection |
FALSE |
|||||
|
int |
Number of epochs to run the training. |
10 |
1 |
inf |
TRUE |
|
|
int |
The interval (in epochs) at which a checkpoint will be saved. Helps resume training. |
1 |
1 |
FALSE |
||
|
int |
The interval (in epochs) at which a evaluation will be triggered on the validation dataset. |
1 |
1 |
FALSE |
||
|
string |
Path to the checkpoint to resume training from. |
FALSE |
||||
|
string |
Path to where all the assets generated from a task are stored. |
FALSE |
||||
|
bool |
Whether EpochBasedRunner is used. |
True |
FALSE |
|||
|
int |
logging interval every k iterations. |
1 |
FALSE |
|||
|
bool |
Whether to resume the training or not. |
False |
FALSE |
|||
|
string |
Path to a pre-trained BEVFusion model to initialize the current training from. |
FALSE |
||||
|
collection |
Hyper parameters to configure the optimizer |
FALSE |
||||
|
list |
Hyper parameters to configure the learning rate scheduler. |
[{‘type’: ‘LinearLR’, ‘start_factor’: 0.33333333, ‘by_epoch’: False, ‘begin’: 0, ‘end’: 500}, {‘type’: ‘CosineAnnealingLR’, ‘T_max’: 10, ‘eta_min_ratio’: 0.0001, ‘begin’: 0, ‘end’: 10, ‘by_epoch’: True}, {‘type’: ‘CosineAnnealingMomentum’, ‘eta_min’: 0.8947, ‘begin’: 0, ‘end’: 2.4, ‘by_epoch’: True}, {‘type’: ‘CosineAnnealingMomentum’, ‘eta_min’: 1, ‘begin’: 2.4, ‘end’: 10, ‘by_epoch’: True}] |
FALSE |
Optimizer config#
The optim
parameter defines the config for the optimizer in training, including the
learning rate, learning scheduler, and weight decay.
Field |
value_type |
description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
|
string |
Type of optimizer used to train the network. |
AdamW |
FALSE |
|||
|
float |
The initial learning rate for training the model. |
0.0002 |
FALSE |
|||
|
float |
The weight decay coefficient. |
0.01 |
FALSE |
|||
|
list |
The moving average parameter for adaptive learning rate. |
[0.9, 0.999] |
FALSE |
|||
|
collection |
Clip the gradient norm of an iterable of parameters. |
{‘max_norm’: 35, ‘norm_type’: 2} |
FALSE |
|||
|
string |
Opitmizer Wrapper in MMengine. AmpOptimWrapper to enables mixed precision training |
OptimWrapper |
FALSE |
Evaluation Config#
The evaluate
parameter defines the hyperparameters of the evaluation process.
evaluate:
checkpoint: /path/to/model.pth
num_gpus: 1
Field |
value_type |
description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
|
int |
1 |
FALSE |
||||
|
list |
[0] |
FALSE |
||||
|
int |
1 |
FALSE |
||||
|
string |
??? |
FALSE |
||||
|
string |
FALSE |
Inference Config#
The inference
parameter defines the hyperparameters of the inference process.
inference:
checkpoint: /path/to/model.pth
num_gpus: 1
Field |
value_type |
description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
|
int |
1 |
FALSE |
||||
|
list |
[0] |
FALSE |
||||
|
int |
1 |
FALSE |
||||
|
string |
??? |
FALSE |
||||
|
string |
FALSE |
|||||
|
float |
Confidence Threshold |
0.5 |
FALSE |
|||
|
bool |
Whether to show the 3D visualizaiton on screen |
False |
FALSE |
Training the Model#
To train a BEVFusion model, use this command:
tao model bevfusion train [-h] -e <experiment_spec>
[-r <results_dir>]
Required Arguments#
-e, --experiment_spec
: The experiment specification file to set up the training experiment
Optional Arguments#
-r, --results_dir
: The path to the folder where the experiment outputs should be written. If this argument is not specified, theresults_dir
from the spec file is used.--gpus
: The number of GPUs used to run training--num_nodes
: The number of nodes used to run training. If this value is larger than 1, distributed multi-node training is enabled.-h, --help
: Show this help message and exit.
Sample Usage#
Here’s an example of the train
command:
tao bevfusion model train -e /path/to/spec.yaml
Evaluating the Model#
To run evaluation with a BEVFusion model, use this command:
tao model bevfusion evaluate [-h] -e <experiment_spec>
[-r <results_dir>]
Required Arguments#
-e, --experiment_spec
: The experiment spec file to set up the evaluation experiment
Optional Arguments#
-r, --results_dir
: The directory where the evaluation result is stored
Sample Usage#
Here’s an example of using the evaluate
command:
tao model bevfusion evaluate -e /path/to/spec.yaml -r /path/to/results/ evaluate.checkpoint=/path/to/model.pth
Running Inference with BEVFusion Model#
Use the following command to run inference on BEVFusion with .pth
:
tao model bevfusion inference [-h] -e <experiment spec file>
[-r <results_dir>]
Required Arguments#
-e, --experiment_spec
: The experiment spec file to set up the inference experiment
Optional Arguments#
-r, --results_dir
: The directory where the inference result is stored
Sample Usage#
Here’s an example of using the inference
command:
tao model bevfusion inference -e /path/to/spec.yaml -r /path/to/results/ inference.checkpoint=/path/to/model.pth