BEVFusion
BEVFusion is an 3D object-detection model that is included in the TAO. It supports the following tasks:
convert
train
evaluate
inference
The sample inferenc result:
TAO BEVFusion Inferece Image
These tasks may be invoked from the TAO Launcher using the following convention on the command line:
tao model bevfusion <sub_task> <args_per_subtask>
where args_per_subtask
are the command-line arguments required for a given subtask. Each of
these subtasks are explained as follows.
The dataset for BEVFusion contains point cloud data, rgb image and the corresponding annotations of 3D objects. The directory structure should be organized as KITTI directory structure.
/kitti
/training
/calib
000000.txt
000001.txt
...
N.txt
/image_2
000000.png
000001.png
...
N.png
/label_2
000000.txt
000001.txt
...
N.txt
/velodyne
000000.bin
000001.bin
...
N.bin
/ImageSets
train.txt
val.txt
test.txt
Each .bin
file should comply with the format described above. Each .txt
label file
should comply to the KITTI format.
Below is a sample BEVFusion spec file. It has six components -model
, inference
,
evaluate
, dataset
and train
-as well as several global parameters,
which are described below. The format of the spec file is a YAML file.
Here’s a sample of the BEVFusion spec file:
results_dir: /results/bevfusion
dataset:
type: KittiPersonDataset
root_dir: /data/
gt_box_type: camera
default_cam_key: CAM2
train_dataset:
repeat_time: 2
ann_file: /data/kitti_person_infos_train.pkl
data_prefix:
pts: training/velodyne_reduced
img: training/image_2
batch_size: 4
num_workers: 8
val_dataset:
ann_file: /data/kitti_person_infos_val.pkl
data_prefix:
pts: training/velodyne_reduced
img: training/image_2
batch_size: 2
num_workers: 4
test_dataset:
ann_file: /data/kitti_person_infos_val.pkl
data_prefix:
pts: training/velodyne_reduced
img: training/image_2
batch_size: 4
num_workers: 4
model:
type: BEVFusion
point_cloud_range: [0, -40, -3, 70.4, 40, 1]
voxel_size: [0.05, 0.05, 0.1]
grid_size: [1440, 1440, 41]
train:
num_gpus: 1
num_nodes: 1
validation_interval: 1
num_epochs: 5
optimizer:
type: AdamW
lr: 0.0002
lr_scheduler:
- type: LinearLR
start_factor: 0.33333333
by_epoch: False
begin: 0
end: 500
- type: CosineAnnealingLR
T_max: 10
begin: 0
end: 10
by_epoch: True
eta_min_ratio: 1e-4
- type: CosineAnnealingMomentum
eta_min: 0.8947
begin: 0
end: 2.4
by_epoch: True
- type: CosineAnnealingMomentum
eta_min: 1
begin: 2.4
end: 10
by_epoch: True
inference:
num_gpus: 1
conf_threshold: 0.3
checkpoint: /results/train/bevfusion_model.pth
evaluate:
num_gpus: 1
checkpoint: /results/train/bevfusion_model.pth
Field |
value_type |
description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
results_dir |
string | /results | FALSE | ||||
default_scope |
string | Default scope to use mmdet3d | mmdet3d | FALSE | |||
default_hooks |
collection | Default hooks for mmlabs | {‘timer’: {‘type’: ‘IterTimerHook’}, ‘logger’: {‘type’: ‘LoggerHook’, ‘interval’: 1, ‘log_metric_by_epoch’: True}, ‘param_scheduler’: {‘type’: ‘ParamSchedulerHook’}, ‘checkpoint’: {‘type’: ‘CheckpointHook’, ‘by_epoch’: True, ‘interval’: 1}, ‘sampler_seed’: {‘type’: ‘DistSamplerSeedHook’}, ‘visualization’: {‘type’: ‘Det3DVisualizationHook’}} | FALSE | |||
logger_hook |
string | Default logger hook type | TAOBEVFusionLoggerHook | FALSE | |||
manual_seed |
int | Optional manual seed. Seed is set when the value is given in spec file. | FALSE | ||||
input_modality |
collection | Input modality for the model. Set True for each modality to use. | {‘use_lidar’: True, ‘use_camera’: True, ‘use_radar’: False, ‘use_map’: False, ‘use_external’: False} | FALSE | |||
model |
collection | Configurable parameters to construct the model for a BEVFusion experiment. | FALSE | ||||
dataset |
collection | Configurable parameters to construct the dataset for a BEVFusion experiment. | FALSE | ||||
train |
collection | Configurable parameters to construct the trainer for a BEVFusion experiment. | FALSE | ||||
evaluate |
collection | Configurable parameters to construct the evaluator for a BEVFusion experiment. | FALSE | ||||
inference |
collection | Configurable parameters to construct the inferencer for a BEVFusion experiment. | FALSE |
Data Preprocessor Config
The dataset configuration (data_preprocessor
) defines the data source and pre-processing hyperparameters.
Field |
value_type |
description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
type |
string | Name of Data Pre-processor for 3D Fusion | Det3DDataPreprocessor | FALSE | |||
mean |
list | The input mean for RGB frames | [123.675, 116.28, 103.53] | FALSE | |||
std |
list | The input standard deviation per pixel for RGB frames | [58.395, 57.12, 57.375] | FALSE | |||
bgr_to_rgb |
bool | whether to convert image from BGR to RGB. | 32 | FALSE | |||
pad_size_divisor |
int | The size of padded image should be divisible. | 32 | FALSE | |||
voxelize_cfg |
collection | {‘max_num_points’: 10, ‘max_voxels’: [120000, 160000], ‘voxelize_reduce’: True} | FALSE |
Dataset Config
The dataset configuration (dataset
) defines the dataset directories, annotation file and batch size for either train
, val
or test
.
Field |
value_type |
description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
type |
string | Dataset types for 3D Fusion | KittiPersonDataset | TAO3DSyntheticDataset,TAO3DDataset,KittiPersonDataset | FALSE | ||
root_dir |
string | A path to the root directory of the given dataset | /data/ | FALSE | |||
classes |
list | A List of the classes to be trained. | [‘person’] | FALSE | |||
box_type_3d |
string | 3D bounding boxes type to be used when training. | lidar | lidar,camera | FALSE | ||
gt_box_type |
string | 3D bounding boxes type in ground truth. | camera | lidar,camera | FALSE | ||
origin |
list | The origin of the given center point in ground truth 3D bounding boxes. | [0.5, 1.0, 0.5] | FALSE | |||
default_cam_key |
string | Default camera name in dataset | CAM0 | FALSE | |||
per_sequence |
bool | Whether to save results in per sequence format. | False | FALSE | |||
num_views |
int | Number of camera view in dataset. | 1 | FALSE | |||
point_cloud_dim |
int | Input lidar point cloud data dimension | 4 | FALSE | |||
train_dataset |
collection | Configurable parameters to construct the train dataset. | FALSE | ||||
val_dataset |
collection | Configurable parameters to construct the validation dataset. | FALSE | ||||
test_dataset |
collection | Configurable parameters to construct the test dataset. | FALSE | ||||
img_file |
string | Image file for single file inference | FALSE | ||||
pc_file |
string | Point cloud file for single file inference | FALSE | ||||
cam2img |
list | Camera instrinsic matrix for single file inference | FALSE | ||||
lidar2cam |
list | Lidar to camera extrinsic matrix for single file inference | FALSE |
Model Config
The model configuration (model
) defines the BEVFusion model structure. This model
is used for training, evaluation, and inference. A detailed description is included in the
table below.
Field |
value_type |
description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
type |
string | Model name | BEVFusion | BEVFusion | FALSE | ||
point_cloud_range |
list | point cloud range | [0, -40, -3, 70.4, 40, 1] | FALSE | |||
voxel_size |
list | voxel size in voxelization | [0.05, 0.05, 0.1] | FALSE | |||
post_center_range |
list | post processing center filter range | [-61.2, -61.2, -20.0, 61.2, 61.2, 20.0] | FALSE | |||
grid_size |
list | Grid size for bevfusion model | [1440, 1440, 41] | FALSE | |||
data_preprocessor |
collection | Configurable parameters to construct the preprocessor for the bevfusion model. | FALSE | ||||
img_backbone |
collection | Configurable parameters to construct the camera image backbone for the bevfusion model. | FALSE | ||||
img_neck |
collection | Configurable parameters to construct the camera image neck for the bevfusion model. | FALSE | ||||
view_transform |
collection | Configurable parameters to construct the camera view transform for the bevfusion model. | FALSE | ||||
pts_backbone |
collection | Configurable parameters to construct the lidar point cloud backbone for the bevfusion model. | FALSE | ||||
pts_voxel_encoder |
collection | Configurable parameters to construct the lidar point cloud voxel encoder for the bevfusion model. | {‘type’: ‘HardSimpleVFE’, ‘num_features’: 4} | FALSE | |||
pts_middle_encoder |
collection | Configurable parameters to construct the lidar encoder for the bevfusion model. | FALSE | ||||
pts_neck |
collection | Configurable parameters to construct the lidar neck for the bevfusion model. | FALSE | ||||
fusion_layer |
collection | Configurable parameters to construct the fusion layer for the bevfusion model. | FALSE | ||||
bbox_head |
collection | Configurable parameters to construct the bounding box head for the bevfusion model. | FALSE |
Image Backbone Config
The backbone configuration (img_backbone
) defines the backbone structure. A detailed description is included in the
table below. Currently, BEVFusion only supports Swin-Transformers and ResNet50 image backbone.
Field |
value_type |
description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
type |
string | Name of Image Backbone for 3D Fusion | mmdet.SwinTransformer | FALSE | |||
embed_dims |
int | Number of input channels. | 96 | FALSE | |||
depths |
list | Depths of each Swin Transformer stage. | [2, 2, 6, 2] | FALSE | |||
num_heads |
list | Number of attention head of each stage. | [3, 6, 12, 24] | FALSE | |||
window_size |
int | Window size for Swin Transformer. | 7 | FALSE | |||
mlp_ratio |
int | Ratio of mlp hidden dim to embedding dim. | 4 | FALSE | |||
qkv_bias |
bool | If True, add a learnable bias to query, key, value. | True | FALSE | |||
qk_scale |
string | Override default qk scale of head_dim ** -0.5 if set. | FALSE | ||||
drop_rate |
float | Dropout rate. | 0.0 | FALSE | |||
attn_drop_rate |
float | Attention dropout rate. | 0.0 | FALSE | |||
drop_path_rate |
float | Stochastic drop rate | 0.2 | FALSE | |||
patch_norm |
bool | If True, add normalization after patch embedding. | True | FALSE | |||
out_indices |
list | Output from which stages. | [1, 2, 3] | FALSE | |||
with_cp |
bool |
Use checkpoint or not. Using checkpoint |
False |
FALSE | |||
convert_weights |
bool |
The flag indicates whether the |
True |
FALSE | |||
init_cfg |
collection | Configuration for initialzation. | FALSE |
Image Neck Config
The neck configuration (img_neck
) defines the image neck structure. A detailed description is included in the
table below. Currently, BEVFusion only supports GeneralizedLSSFPN image backbone.
Field |
value_type |
description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
type |
string | Image Neck Name | GeneralizedLSSFPN | FALSE | |||
in_channels |
list | The number of input channels for image neck. | [192, 384, 768] | FALSE | |||
out_channels |
int | The number of output channels for image neck. | 256 | FALSE | |||
start_level |
int | Starting level for image neck. | 0 | FALSE | |||
num_outs |
int | The number of outputput for image neck. | 0 | FALSE | |||
norm_cfg |
collection | The configuration of normalization for image neck. | {‘type’: ‘BN2d’, ‘requires_grad’: True} | FALSE | |||
act_cfg |
collection | The configuration of activation for image neck. | {‘type’: ‘ReLU’, ‘inplace’: True} | FALSE | |||
upsample_cfg |
collection | The configuration of upsampling for image neck. | {‘mode’: ‘bilinear’, ‘align_corners’: False} | FALSE |
View Transform Config
The configuration (view_transform
) defines the view transform structure for camera input. A detailed description is included in the
table below. Currently, BEVFusion only supports DepthLSSTransform and LSSTransform image backbone.
Field |
value_type |
description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
type |
string | Image view transform name. | DepthLSSTransform | DepthLSSTransform,LSSTransform | FALSE | ||
in_channels |
int | The number of input channels for view transform. | 256 | FALSE | |||
out_channels |
int | The number of output channels for view transform. | 80 | FALSE | |||
image_size |
list | Image size for view transform. | [256, 704] | FALSE | |||
feature_size |
list | Feature size for view transform. | [32, 88] | FALSE | |||
xbound |
list | The grid range for x-axis. | [-54.0, 54.0, 0.3] | FALSE | |||
ybound |
list | The grid range for y-axis. | [-54.0, 54.0, 0.3] | FALSE | |||
zbound |
list | The grid range for z-axis. | [-10.0, 10.0, 20.0] | FALSE | |||
dbound |
list | The grid range for depth. | [1.0, 60.0, 0.5] | FALSE | |||
downsample |
int | The ratio for downsampling. | 2 | FALSE |
Lidar Backbone Config
The backbone configuration (lidar_backbone
) defines the image backbone structure. A detailed description is included in the
table below. Currently, BEVFusion only supports SECOND lidar backbone at the moment.
Field |
value_type |
description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
type |
string | The lidar backbone name. | SECOND | FALSE | |||
in_channels |
int | The number of input channels for lidar backbone. | 256 | FALSE | |||
out_channels |
list | The number of output channels for lidar backbone. | [128, 256] | FALSE | |||
layer_nums |
list | The number of layer in each stage for lidar backbone. | [5, 5] | FALSE | |||
layer_strides |
list | Number of layers in each stage for lidar backbone. | [1, 2] | FALSE | |||
norm_cfg |
collection | The configuration of normalization for lidar backbone. | {‘type’: ‘BN’, ‘eps’: 0.001, ‘momentum’: 0.01} | FALSE | |||
conv_cfg |
collection | The configuration of convolution layers for lidar backbone. | {‘type’: ‘Conv2d’, ‘bias’: False} | FALSE |
Lidar Encoder Config
The encoder configuration (pts_middle_encoder
) defines the lidar encoder structure. A detailed description is included in the
table below. Currently, BEVFusion only supports BEVFusionSparseEncoder structure at the moment.
Field |
value_type |
description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
type |
string | The lidar encoder name. | BEVFusionSparseEncoder | FALSE | |||
in_channels |
int | The number of input channels for lidar encoder. | 4 | FALSE | |||
sparse_shape |
list | The sparse shape of input tensor. | [1440, 1440, 41] | FALSE | |||
order |
list | Order of conv module. | [‘conv’, ‘norm’, ‘act’] | FALSE | |||
norm_cfg |
collection | The configuration of normalization for lidar encoder. | {‘type’: ‘BN1d’, ‘eps’: 0.001, ‘momentum’: 0.01} | FALSE | |||
encoder_channels |
|||||||
encoder_paddings |
|||||||
block_type |
string | Type of the block to use. | basicblock | FALSE |
Lidar Neck Config
The configuration (pts_neck
) defines the lidar neck structure. A detailed description is included in the
table below. Currently, BEVFusion only supports SECONDFPN structure at the moment.
Field |
value_type |
description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
type |
string | The lidar neck name. | SECONDFPN | FALSE | |||
in_channels |
list | The number of input channels for lidar neck. | [128, 256] | FALSE | |||
out_channels |
list | The number of output channels for lidar neck. | [256, 256] | FALSE | |||
upsample_strides |
list | Strides used to upsample the feature map for lidar neck. | [1, 2] | FALSE | |||
norm_cfg |
collection | The configuration of normalization for lidar neck. | {‘type’: ‘BN’, ‘eps’: 0.001, ‘momentum’: 0.01} | FALSE | |||
upsample_cfg |
collection | The configuration of upsample layers for lidar neck. | {‘type’: ‘deconv’, ‘bias’: False} | FALSE | |||
use_conv_for_no_stride |
bool | Whether to use conv when stride is 1. | True | FALSE |
Fusion Layer Config
The configuration (fusion_layer
) defines the fusion layer structure. A detailed description is included in the
table below. Currently, BEVFusion only supports ConvFuser structure at the moment.
Field |
value_type |
description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
type |
string | The fusion layer name. | ConvFuser | FALSE | |||
in_channels |
list | The number of input channels for fusion layer. | [80, 256] | FALSE | |||
out_channels |
int | The number of output channels for fusion layer. | 256 | FALSE |
BBoxHead Config
The configuration (bbox_head
) defines the bbox prediction head structure. A detailed description is included in the
table below. Currently, BEVFusion only supports BEVFusionHead structure at the moment.
Field |
value_type |
description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
type |
string | Prediction head name. | BEVFusionHead | BEVFusionHead | FALSE | ||
num_proposals |
int | Number of proposals. | 200 | FALSE | |||
auxiliary |
bool | Whether to enable auxiliary training. | True | FALSE | |||
in_channels |
int | Number of channels in the input feature map. | 512 | FALSE | |||
hidden_channel |
int | Number of hiden channel. | 128 | FALSE | |||
num_classes |
int | Number of classes. | 1 | FALSE | |||
nms_kernel_size |
int | NMS kernel size. | 3 | FALSE | |||
bn_momentum |
float | Batch Norm momentum. | 0.1 | FALSE | |||
num_decoder_layers |
int | Number of decoder layer. | 1 | FALSE | |||
out_size_factor |
int | Output size factor. | 8 | FALSE | |||
bbox_coder |
collection | The configuration for bounding box encoder. | FALSE | ||||
decoder_layer |
collection | The configuration for decoder layer. | FALSE | ||||
code_weights |
list | Weights for box encoder. | [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0] | FALSE | |||
nms_type |
string | The type of NMS. | FALSE | ||||
assigner |
collection | The configuration for assginer. | {‘type’: ‘HungarianAssigner3D’, ‘iou_calculator’: {‘type’: ‘BboxOverlaps3D’, ‘coordinate’: ‘lidar’}, ‘cls_cost’: {‘type’: ‘mmdet.FocalLossCost’, ‘gamma’: 2.0, ‘alpha’: 0.25, ‘weight’: 0.15}, ‘reg_cost’: {‘type’: ‘BBoxBEVL1Cost’, ‘weight’: 0.25}, ‘iou_cost’: {‘type’: ‘IoU3DCost’, ‘weight’: 0.25}} | FALSE | |||
common_heads |
collection | The configuration for common heads. | {‘center’: [2, 2], ‘height’: [1, 2], ‘dim’: [3, 2], ‘rot’: [6, 2]} | FALSE | |||
loss_cls |
collection | The configuration for classification loss. | {‘type’: ‘mmdet.FocalLoss’, ‘use_sigmoid’: True, ‘gamma’: 2.0, ‘alpha’: 0.25, ‘reduction’: ‘mean’, ‘loss_weight’: 1.0} | FALSE | |||
loss_heatmap |
collection | The configuration for heatmap loss. | {‘type’: ‘mmdet.GaussianFocalLoss’, ‘reduction’: ‘mean’, ‘loss_weight’: 1.0} | FALSE | |||
loss_bbox |
collection | The configuration for bounding box loss. | {‘type’: ‘mmdet.L1Loss’, ‘reduction’: ‘mean’, ‘loss_weight’: 0.25} | FALSE |
Train Config
The train
configuration defines the hyperparameters of the training process.
train:
precision: 'fp16'
num_gpus: 1
checkpoint_interval: 10
validation_interval: 10
num_epochs: 50
optim:
type: "AdamW"
lr: 0.0001
weight_decay: 0.05
Field |
value_type |
description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
num_gpus |
int | The number of GPUs to run the train job. | 1 | 1 | FALSE | ||
gpu_ids |
list | List of GPU IDs to run the training on. The length of this list must be equal to the number of gpus in train.num_gpus. | [0] | FALSE | |||
num_nodes |
int | Number of nodes to run the training on. If > 1, then multi-node is enabled. | 1 | FALSE | |||
seed |
int | The seed for the initializer in PyTorch. If < 0, disable fixed seed. | 1234 | -1 | inf | FALSE | |
cudnn |
collection | FALSE | |||||
num_epochs |
int | Number of epochs to run the training. | 10 | 1 | inf | TRUE | |
checkpoint_interval |
int | The interval (in epochs) at which a checkpoint will be saved. Helps resume training. | 1 | 1 | FALSE | ||
validation_interval |
int | The interval (in epochs) at which a evaluation will be triggered on the validation dataset. | 1 | 1 | FALSE | ||
resume_training_checkpoint_path |
string | Path to the checkpoint to resume training from. | FALSE | ||||
results_dir |
string | Path to where all the assets generated from a task are stored. | FALSE | ||||
by_epoch |
bool | Whether EpochBasedRunner is used. | True | FALSE | |||
logging_interval |
int | logging interval every k iterations. | 1 | FALSE | |||
resume |
bool | Whether to resume the training or not. | False | FALSE | |||
pretrained_checkpoint |
string | Path to a pre-trained BEVFusion model to initialize the current training from. | FALSE | ||||
optimizer |
collection | Hyper parameters to configure the optimizer | FALSE | ||||
lr_scheduler |
list | Hyper parameters to configure the learning rate scheduler. | [{‘type’: ‘LinearLR’, ‘start_factor’: 0.33333333, ‘by_epoch’: False, ‘begin’: 0, ‘end’: 500}, {‘type’: ‘CosineAnnealingLR’, ‘T_max’: 10, ‘eta_min_ratio’: 0.0001, ‘begin’: 0, ‘end’: 10, ‘by_epoch’: True}, {‘type’: ‘CosineAnnealingMomentum’, ‘eta_min’: 0.8947, ‘begin’: 0, ‘end’: 2.4, ‘by_epoch’: True}, {‘type’: ‘CosineAnnealingMomentum’, ‘eta_min’: 1, ‘begin’: 2.4, ‘end’: 10, ‘by_epoch’: True}] | FALSE |
Optimizer config
The optim
parameter defines the config for the optimizer in training, including the
learning rate, learning scheduler, and weight decay.
Field |
value_type |
description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
type |
string | Type of optimizer used to train the network. | AdamW | FALSE | |||
lr |
float | The initial learning rate for training the model. | 0.0002 | FALSE | |||
weight_decay |
float | The weight decay coefficient. | 0.01 | FALSE | |||
betas |
list | The moving average parameter for adaptive learning rate. | [0.9, 0.999] | FALSE | |||
clip_grad |
collection | Clip the gradient norm of an iterable of parameters. | {‘max_norm’: 35, ‘norm_type’: 2} | FALSE | |||
wrapper_type |
string | Opitmizer Wrapper in MMengine. AmpOptimWrapper to enables mixed precision training | OptimWrapper | FALSE |
Evaluation Config
The evaluate
parameter defines the hyperparameters of the evaluation process.
evaluate:
checkpoint: /path/to/model.pth
num_gpus: 1
Field |
value_type |
description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
num_gpus |
int | 1 | FALSE | ||||
gpu_ids |
list | [0] | FALSE | ||||
num_nodes |
int | 1 | FALSE | ||||
checkpoint |
string | ??? | FALSE | ||||
results_dir |
string | FALSE |
Inference Config
The inference
parameter defines the hyperparameters of the inference process.
inference:
checkpoint: /path/to/model.pth
num_gpus: 1
Field |
value_type |
description |
default_value |
valid_min |
valid_max |
valid_options |
automl_enabled |
---|---|---|---|---|---|---|---|
num_gpus |
int | 1 | FALSE | ||||
gpu_ids |
list | [0] | FALSE | ||||
num_nodes |
int | 1 | FALSE | ||||
checkpoint |
string | ??? | FALSE | ||||
results_dir |
string | FALSE | |||||
conf_threshold |
float | Confidence Threshold | 0.5 | FALSE | |||
show |
bool | Whether to show the 3D visualizaiton on screen | False | FALSE |
To train a BEVFusion model, use this command:
tao model bevfusion train [-h] -e <experiment_spec>
[-r <results_dir>]
Required Arguments
-e, --experiment_spec
: The experiment specification file to set up the training experiment
Optional Arguments
-r, --results_dir
: The path to the folder where the experiment outputs should be written. If this argument is not specified, theresults_dir
from the spec file is used.--gpus
: The number of GPUs used to run training--num_nodes
: The number of nodes used to run training. If this value is larger than 1, distributed multi-node training is enabled.-h, --help
: Show this help message and exit.
Sample Usage
Here’s an example of the train
command:
tao bevfusion model train -e /path/to/spec.yaml
To run evaluation with a BEVFusion model, use this command:
tao model bevfusion evaluate [-h] -e <experiment_spec>
[-r <results_dir>]
Required Arguments
-e, --experiment_spec
: The experiment spec file to set up the evaluation experiment
Optional Arguments
-r, --results_dir
: The directory where the evaluation result is stored
Sample Usage
Here’s an example of using the evaluate
command:
tao model bevfusion evaluate -e /path/to/spec.yaml -r /path/to/results/ evaluate.checkpoint=/path/to/model.pth
Use the following command to run inference on BEVFusion with .pth
:
tao model bevfusion inference [-h] -e <experiment spec file>
[-r <results_dir>]
Required Arguments
-e, --experiment_spec
: The experiment spec file to set up the inference experiment
Optional Arguments
-r, --results_dir
: The directory where the inference result is stored
Sample Usage
Here’s an example of using the inference
command:
tao model bevfusion inference -e /path/to/spec.yaml -r /path/to/results/ inference.checkpoint=/path/to/model.pth