BEVFusion#

BEVFusion is an 3D object-detection model that is included in the TAO. It supports the following tasks:

  • convert

  • train

  • evaluate

  • inference

The sample inferenc result:

../../../_images/bevfusion_sample.png

TAO BEVFusion Inferece Image#

These tasks may be invoked from the TAO Launcher using the following convention on the command line:

tao model bevfusion <sub_task> <args_per_subtask>

where args_per_subtask are the command-line arguments required for a given subtask. Each of these subtasks are explained as follows.

Dataset Format#

The dataset for BEVFusion contains point cloud data, rgb image and the corresponding annotations of 3D objects. The directory structure should be organized as KITTI directory structure.

/kitti
    /training
        /calib
          000000.txt
          000001.txt
            ...
          N.txt
        /image_2
          000000.png
          000001.png
            ...
          N.png
        /label_2
          000000.txt
          000001.txt
            ...
          N.txt
        /velodyne
          000000.bin
          000001.bin
            ...
          N.bin
    /ImageSets
        train.txt
        val.txt
        test.txt

Each .bin file should comply with the format described above. Each .txt label file should comply to the KITTI format.

Creating a Configuration File#

Below is a sample BEVFusion spec file. It has six components -model, inference, evaluate, dataset and train-as well as several global parameters, which are described below. The format of the spec file is a YAML file.

Here’s a sample of the BEVFusion spec file:

results_dir: /results/bevfusion
dataset:
  type: KittiPersonDataset
  root_dir: /data/
  gt_box_type: camera
  default_cam_key: CAM2
  train_dataset:
    repeat_time: 2
    ann_file: /data/kitti_person_infos_train.pkl
    data_prefix:
      pts: training/velodyne_reduced
      img: training/image_2
    batch_size: 4
    num_workers: 8
  val_dataset:
    ann_file: /data/kitti_person_infos_val.pkl
    data_prefix:
      pts: training/velodyne_reduced
      img: training/image_2
    batch_size: 2
    num_workers: 4
  test_dataset:
    ann_file: /data/kitti_person_infos_val.pkl
    data_prefix:
      pts: training/velodyne_reduced
      img: training/image_2
    batch_size: 4
    num_workers: 4
model:
  type: BEVFusion
  point_cloud_range: [0, -40, -3, 70.4, 40, 1]
  voxel_size: [0.05, 0.05, 0.1]
  grid_size: [1440, 1440, 41]
train:
  num_gpus: 1
  num_nodes: 1
  validation_interval: 1
  num_epochs: 5
  optimizer:
    type: AdamW
    lr:  0.0002
  lr_scheduler:
    - type: LinearLR
      start_factor: 0.33333333
      by_epoch: False
      begin: 0
      end: 500
    - type: CosineAnnealingLR
      T_max: 10
      begin: 0
      end: 10
      by_epoch: True
      eta_min_ratio: 1e-4
    - type: CosineAnnealingMomentum
      eta_min: 0.8947
      begin: 0
      end: 2.4
      by_epoch: True
    - type: CosineAnnealingMomentum
      eta_min: 1
      begin: 2.4
      end: 10
      by_epoch: True
inference:
  num_gpus: 1
  conf_threshold: 0.3
  checkpoint: /results/train/bevfusion_model.pth
evaluate:
  num_gpus: 1
  checkpoint: /results/train/bevfusion_model.pth

Field

value_type

description

default_value

valid_min

valid_max

valid_options

automl_enabled

results_dir

string

/results

FALSE

default_scope

string

Default scope to use mmdet3d

mmdet3d

FALSE

default_hooks

collection

Default hooks for mmlabs

{‘timer’: {‘type’: ‘IterTimerHook’}, ‘logger’: {‘type’: ‘LoggerHook’, ‘interval’: 1, ‘log_metric_by_epoch’: True}, ‘param_scheduler’: {‘type’: ‘ParamSchedulerHook’}, ‘checkpoint’: {‘type’: ‘CheckpointHook’, ‘by_epoch’: True, ‘interval’: 1}, ‘sampler_seed’: {‘type’: ‘DistSamplerSeedHook’}, ‘visualization’: {‘type’: ‘Det3DVisualizationHook’}}

FALSE

logger_hook

string

Default logger hook type

TAOBEVFusionLoggerHook

FALSE

manual_seed

int

Optional manual seed. Seed is set when the value is given in spec file.

FALSE

input_modality

collection

Input modality for the model. Set True for each modality to use.

{‘use_lidar’: True, ‘use_camera’: True, ‘use_radar’: False, ‘use_map’: False, ‘use_external’: False}

FALSE

model

collection

Configurable parameters to construct the model for a BEVFusion experiment.

FALSE

dataset

collection

Configurable parameters to construct the dataset for a BEVFusion experiment.

FALSE

train

collection

Configurable parameters to construct the trainer for a BEVFusion experiment.

FALSE

evaluate

collection

Configurable parameters to construct the evaluator for a BEVFusion experiment.

FALSE

inference

collection

Configurable parameters to construct the inferencer for a BEVFusion experiment.

FALSE

Data Preprocessor Config#

The dataset configuration (data_preprocessor) defines the data source and pre-processing hyperparameters.

Field

value_type

description

default_value

valid_min

valid_max

valid_options

automl_enabled

type

string

Name of Data Pre-processor for 3D Fusion

Det3DDataPreprocessor

FALSE

mean

list

The input mean for RGB frames

[123.675, 116.28, 103.53]

FALSE

std

list

The input standard deviation per pixel for RGB frames

[58.395, 57.12, 57.375]

FALSE

bgr_to_rgb

bool

whether to convert image from BGR to RGB.

32

FALSE

pad_size_divisor

int

The size of padded image should be divisible.

32

FALSE

voxelize_cfg

collection

{‘max_num_points’: 10, ‘max_voxels’: [120000, 160000], ‘voxelize_reduce’: True}

FALSE

Dataset Config#

The dataset configuration (dataset) defines the dataset directories, annotation file and batch size for either train, val or test.

Field

value_type

description

default_value

valid_min

valid_max

valid_options

automl_enabled

type

string

Dataset types for 3D Fusion

KittiPersonDataset

TAO3DSyntheticDataset,TAO3DDataset,KittiPersonDataset

FALSE

root_dir

string

A path to the root directory of the given dataset

/data/

FALSE

classes

list

A List of the classes to be trained.

[‘person’]

FALSE

box_type_3d

string

3D bounding boxes type to be used when training.

lidar

lidar,camera

FALSE

gt_box_type

string

3D bounding boxes type in ground truth.

camera

lidar,camera

FALSE

origin

list

The origin of the given center point in ground truth 3D bounding boxes.

[0.5, 1.0, 0.5]

FALSE

default_cam_key

string

Default camera name in dataset

CAM0

FALSE

per_sequence

bool

Whether to save results in per sequence format.

False

FALSE

num_views

int

Number of camera view in dataset.

1

FALSE

point_cloud_dim

int

Input lidar point cloud data dimension

4

FALSE

train_dataset

collection

Configurable parameters to construct the train dataset.

FALSE

val_dataset

collection

Configurable parameters to construct the validation dataset.

FALSE

test_dataset

collection

Configurable parameters to construct the test dataset.

FALSE

img_file

string

Image file for single file inference

FALSE

pc_file

string

Point cloud file for single file inference

FALSE

cam2img

list

Camera instrinsic matrix for single file inference

FALSE

lidar2cam

list

Lidar to camera extrinsic matrix for single file inference

FALSE

Model Config#

The model configuration (model) defines the BEVFusion model structure. This model is used for training, evaluation, and inference. A detailed description is included in the table below.

Field

value_type

description

default_value

valid_min

valid_max

valid_options

automl_enabled

type

string

Model name

BEVFusion

BEVFusion

FALSE

point_cloud_range

list

point cloud range

[0, -40, -3, 70.4, 40, 1]

FALSE

voxel_size

list

voxel size in voxelization

[0.05, 0.05, 0.1]

FALSE

post_center_range

list

post processing center filter range

[-61.2, -61.2, -20.0, 61.2, 61.2, 20.0]

FALSE

grid_size

list

Grid size for bevfusion model

[1440, 1440, 41]

FALSE

data_preprocessor

collection

Configurable parameters to construct the preprocessor for the bevfusion model.

FALSE

img_backbone

collection

Configurable parameters to construct the camera image backbone for the bevfusion model.

FALSE

img_neck

collection

Configurable parameters to construct the camera image neck for the bevfusion model.

FALSE

view_transform

collection

Configurable parameters to construct the camera view transform for the bevfusion model.

FALSE

pts_backbone

collection

Configurable parameters to construct the lidar point cloud backbone for the bevfusion model.

FALSE

pts_voxel_encoder

collection

Configurable parameters to construct the lidar point cloud voxel encoder for the bevfusion model.

{‘type’: ‘HardSimpleVFE’, ‘num_features’: 4}

FALSE

pts_middle_encoder

collection

Configurable parameters to construct the lidar encoder for the bevfusion model.

FALSE

pts_neck

collection

Configurable parameters to construct the lidar neck for the bevfusion model.

FALSE

fusion_layer

collection

Configurable parameters to construct the fusion layer for the bevfusion model.

FALSE

bbox_head

collection

Configurable parameters to construct the bounding box head for the bevfusion model.

FALSE

Image Backbone Config#

The backbone configuration (img_backbone) defines the backbone structure. A detailed description is included in the table below. Currently, BEVFusion only supports Swin-Transformers and ResNet50 image backbone.

Field

value_type

description

default_value

valid_min

valid_max

valid_options

automl_enabled

type

string

Name of Image Backbone for 3D Fusion

mmdet.SwinTransformer

FALSE

embed_dims

int

Number of input channels.

96

FALSE

depths

list

Depths of each Swin Transformer stage.

[2, 2, 6, 2]

FALSE

num_heads

list

Number of attention head of each stage.

[3, 6, 12, 24]

FALSE

window_size

int

Window size for Swin Transformer.

7

FALSE

mlp_ratio

int

Ratio of mlp hidden dim to embedding dim.

4

FALSE

qkv_bias

bool

If True, add a learnable bias to query, key, value.

True

FALSE

qk_scale

string

Override default qk scale of head_dim ** -0.5 if set.

FALSE

drop_rate

float

Dropout rate.

0.0

FALSE

attn_drop_rate

float

Attention dropout rate.

0.0

FALSE

drop_path_rate

float

Stochastic drop rate

0.2

FALSE

patch_norm

bool

If True, add normalization after patch embedding.

True

FALSE

out_indices

list

Output from which stages.

[1, 2, 3]

FALSE

with_cp

bool

Use checkpoint or not. Using checkpoint
will save some memory while slowing down the training speed.
False

FALSE

convert_weights

bool

The flag indicates whether the
pre-trained model is from the original repo.
True

FALSE

init_cfg

collection

Configuration for initialzation.

FALSE

Image Neck Config#

The neck configuration (img_neck) defines the image neck structure. A detailed description is included in the table below. Currently, BEVFusion only supports GeneralizedLSSFPN image backbone.

Field

value_type

description

default_value

valid_min

valid_max

valid_options

automl_enabled

type

string

Image Neck Name

GeneralizedLSSFPN

FALSE

in_channels

list

The number of input channels for image neck.

[192, 384, 768]

FALSE

out_channels

int

The number of output channels for image neck.

256

FALSE

start_level

int

Starting level for image neck.

0

FALSE

num_outs

int

The number of outputput for image neck.

0

FALSE

norm_cfg

collection

The configuration of normalization for image neck.

{‘type’: ‘BN2d’, ‘requires_grad’: True}

FALSE

act_cfg

collection

The configuration of activation for image neck.

{‘type’: ‘ReLU’, ‘inplace’: True}

FALSE

upsample_cfg

collection

The configuration of upsampling for image neck.

{‘mode’: ‘bilinear’, ‘align_corners’: False}

FALSE

View Transform Config#

The configuration (view_transform) defines the view transform structure for camera input. A detailed description is included in the table below. Currently, BEVFusion only supports DepthLSSTransform and LSSTransform image backbone.

Field

value_type

description

default_value

valid_min

valid_max

valid_options

automl_enabled

type

string

Image view transform name.

DepthLSSTransform

DepthLSSTransform,LSSTransform

FALSE

in_channels

int

The number of input channels for view transform.

256

FALSE

out_channels

int

The number of output channels for view transform.

80

FALSE

image_size

list

Image size for view transform.

[256, 704]

FALSE

feature_size

list

Feature size for view transform.

[32, 88]

FALSE

xbound

list

The grid range for x-axis.

[-54.0, 54.0, 0.3]

FALSE

ybound

list

The grid range for y-axis.

[-54.0, 54.0, 0.3]

FALSE

zbound

list

The grid range for z-axis.

[-10.0, 10.0, 20.0]

FALSE

dbound

list

The grid range for depth.

[1.0, 60.0, 0.5]

FALSE

downsample

int

The ratio for downsampling.

2

FALSE

Lidar Backbone Config#

The backbone configuration (lidar_backbone) defines the image backbone structure. A detailed description is included in the table below. Currently, BEVFusion only supports SECOND lidar backbone at the moment.

Field

value_type

description

default_value

valid_min

valid_max

valid_options

automl_enabled

type

string

The lidar backbone name.

SECOND

FALSE

in_channels

int

The number of input channels for lidar backbone.

256

FALSE

out_channels

list

The number of output channels for lidar backbone.

[128, 256]

FALSE

layer_nums

list

The number of layer in each stage for lidar backbone.

[5, 5]

FALSE

layer_strides

list

Number of layers in each stage for lidar backbone.

[1, 2]

FALSE

norm_cfg

collection

The configuration of normalization for lidar backbone.

{‘type’: ‘BN’, ‘eps’: 0.001, ‘momentum’: 0.01}

FALSE

conv_cfg

collection

The configuration of convolution layers for lidar backbone.

{‘type’: ‘Conv2d’, ‘bias’: False}

FALSE

Lidar Encoder Config#

The encoder configuration (pts_middle_encoder) defines the lidar encoder structure. A detailed description is included in the table below. Currently, BEVFusion only supports BEVFusionSparseEncoder structure at the moment.

Field

value_type

description

default_value

valid_min

valid_max

valid_options

automl_enabled

type

string

The lidar encoder name.

BEVFusionSparseEncoder

FALSE

in_channels

int

The number of input channels for lidar encoder.

4

FALSE

sparse_shape

list

The sparse shape of input tensor.

[1440, 1440, 41]

FALSE

order

list

Order of conv module.

[‘conv’, ‘norm’, ‘act’]

FALSE

norm_cfg

collection

The configuration of normalization for lidar encoder.

{‘type’: ‘BN1d’, ‘eps’: 0.001, ‘momentum’: 0.01}

FALSE

encoder_channels

encoder_paddings

block_type

string

Type of the block to use.

basicblock

FALSE

Lidar Neck Config#

The configuration (pts_neck) defines the lidar neck structure. A detailed description is included in the table below. Currently, BEVFusion only supports SECONDFPN structure at the moment.

Field

value_type

description

default_value

valid_min

valid_max

valid_options

automl_enabled

type

string

The lidar neck name.

SECONDFPN

FALSE

in_channels

list

The number of input channels for lidar neck.

[128, 256]

FALSE

out_channels

list

The number of output channels for lidar neck.

[256, 256]

FALSE

upsample_strides

list

Strides used to upsample the feature map for lidar neck.

[1, 2]

FALSE

norm_cfg

collection

The configuration of normalization for lidar neck.

{‘type’: ‘BN’, ‘eps’: 0.001, ‘momentum’: 0.01}

FALSE

upsample_cfg

collection

The configuration of upsample layers for lidar neck.

{‘type’: ‘deconv’, ‘bias’: False}

FALSE

use_conv_for_no_stride

bool

Whether to use conv when stride is 1.

True

FALSE

Fusion Layer Config#

The configuration (fusion_layer) defines the fusion layer structure. A detailed description is included in the table below. Currently, BEVFusion only supports ConvFuser structure at the moment.

Field

value_type

description

default_value

valid_min

valid_max

valid_options

automl_enabled

type

string

The fusion layer name.

ConvFuser

FALSE

in_channels

list

The number of input channels for fusion layer.

[80, 256]

FALSE

out_channels

int

The number of output channels for fusion layer.

256

FALSE

BBoxHead Config#

The configuration (bbox_head) defines the bbox prediction head structure. A detailed description is included in the table below. Currently, BEVFusion only supports BEVFusionHead structure at the moment.

Field

value_type

description

default_value

valid_min

valid_max

valid_options

automl_enabled

type

string

Prediction head name.

BEVFusionHead

BEVFusionHead

FALSE

num_proposals

int

Number of proposals.

200

FALSE

auxiliary

bool

Whether to enable auxiliary training.

True

FALSE

in_channels

int

Number of channels in the input feature map.

512

FALSE

hidden_channel

int

Number of hiden channel.

128

FALSE

num_classes

int

Number of classes.

1

FALSE

nms_kernel_size

int

NMS kernel size.

3

FALSE

bn_momentum

float

Batch Norm momentum.

0.1

FALSE

num_decoder_layers

int

Number of decoder layer.

1

FALSE

out_size_factor

int

Output size factor.

8

FALSE

bbox_coder

collection

The configuration for bounding box encoder.

FALSE

decoder_layer

collection

The configuration for decoder layer.

FALSE

code_weights

list

Weights for box encoder.

[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]

FALSE

nms_type

string

The type of NMS.

FALSE

assigner

collection

The configuration for assginer.

{‘type’: ‘HungarianAssigner3D’, ‘iou_calculator’: {‘type’: ‘BboxOverlaps3D’, ‘coordinate’: ‘lidar’}, ‘cls_cost’: {‘type’: ‘mmdet.FocalLossCost’, ‘gamma’: 2.0, ‘alpha’: 0.25, ‘weight’: 0.15}, ‘reg_cost’: {‘type’: ‘BBoxBEVL1Cost’, ‘weight’: 0.25}, ‘iou_cost’: {‘type’: ‘IoU3DCost’, ‘weight’: 0.25}}

FALSE

common_heads

collection

The configuration for common heads.

{‘center’: [2, 2], ‘height’: [1, 2], ‘dim’: [3, 2], ‘rot’: [6, 2]}

FALSE

loss_cls

collection

The configuration for classification loss.

{‘type’: ‘mmdet.FocalLoss’, ‘use_sigmoid’: True, ‘gamma’: 2.0, ‘alpha’: 0.25, ‘reduction’: ‘mean’, ‘loss_weight’: 1.0}

FALSE

loss_heatmap

collection

The configuration for heatmap loss.

{‘type’: ‘mmdet.GaussianFocalLoss’, ‘reduction’: ‘mean’, ‘loss_weight’: 1.0}

FALSE

loss_bbox

collection

The configuration for bounding box loss.

{‘type’: ‘mmdet.L1Loss’, ‘reduction’: ‘mean’, ‘loss_weight’: 0.25}

FALSE

Train Config#

The train configuration defines the hyperparameters of the training process.

train:
  precision: 'fp16'
  num_gpus: 1
  checkpoint_interval: 10
  validation_interval: 10
  num_epochs: 50
  optim:
    type: "AdamW"
    lr: 0.0001
    weight_decay: 0.05

Field

value_type

description

default_value

valid_min

valid_max

valid_options

automl_enabled

num_gpus

int

The number of GPUs to run the train job.

1

1

FALSE

gpu_ids

list

List of GPU IDs to run the training on. The length of this list must be equal to the number of gpus in train.num_gpus.

[0]

FALSE

num_nodes

int

Number of nodes to run the training on. If > 1, then multi-node is enabled.

1

FALSE

seed

int

The seed for the initializer in PyTorch. If < 0, disable fixed seed.

1234

-1

inf

FALSE

cudnn

collection

FALSE

num_epochs

int

Number of epochs to run the training.

10

1

inf

TRUE

checkpoint_interval

int

The interval (in epochs) at which a checkpoint will be saved. Helps resume training.

1

1

FALSE

validation_interval

int

The interval (in epochs) at which a evaluation will be triggered on the validation dataset.

1

1

FALSE

resume_training_checkpoint_path

string

Path to the checkpoint to resume training from.

FALSE

results_dir

string

Path to where all the assets generated from a task are stored.

FALSE

by_epoch

bool

Whether EpochBasedRunner is used.

True

FALSE

logging_interval

int

logging interval every k iterations.

1

FALSE

resume

bool

Whether to resume the training or not.

False

FALSE

pretrained_checkpoint

string

Path to a pre-trained BEVFusion model to initialize the current training from.

FALSE

optimizer

collection

Hyper parameters to configure the optimizer

FALSE

lr_scheduler

list

Hyper parameters to configure the learning rate scheduler.

[{‘type’: ‘LinearLR’, ‘start_factor’: 0.33333333, ‘by_epoch’: False, ‘begin’: 0, ‘end’: 500}, {‘type’: ‘CosineAnnealingLR’, ‘T_max’: 10, ‘eta_min_ratio’: 0.0001, ‘begin’: 0, ‘end’: 10, ‘by_epoch’: True}, {‘type’: ‘CosineAnnealingMomentum’, ‘eta_min’: 0.8947, ‘begin’: 0, ‘end’: 2.4, ‘by_epoch’: True}, {‘type’: ‘CosineAnnealingMomentum’, ‘eta_min’: 1, ‘begin’: 2.4, ‘end’: 10, ‘by_epoch’: True}]

FALSE

Optimizer config#

The optim parameter defines the config for the optimizer in training, including the learning rate, learning scheduler, and weight decay.

Field

value_type

description

default_value

valid_min

valid_max

valid_options

automl_enabled

type

string

Type of optimizer used to train the network.

AdamW

FALSE

lr

float

The initial learning rate for training the model.

0.0002

FALSE

weight_decay

float

The weight decay coefficient.

0.01

FALSE

betas

list

The moving average parameter for adaptive learning rate.

[0.9, 0.999]

FALSE

clip_grad

collection

Clip the gradient norm of an iterable of parameters.

{‘max_norm’: 35, ‘norm_type’: 2}

FALSE

wrapper_type

string

Opitmizer Wrapper in MMengine. AmpOptimWrapper to enables mixed precision training

OptimWrapper

FALSE

Evaluation Config#

The evaluate parameter defines the hyperparameters of the evaluation process.

evaluate:
  checkpoint: /path/to/model.pth
  num_gpus: 1

Field

value_type

description

default_value

valid_min

valid_max

valid_options

automl_enabled

num_gpus

int

1

FALSE

gpu_ids

list

[0]

FALSE

num_nodes

int

1

FALSE

checkpoint

string

???

FALSE

results_dir

string

FALSE

Inference Config#

The inference parameter defines the hyperparameters of the inference process.

inference:
  checkpoint: /path/to/model.pth
  num_gpus: 1

Field

value_type

description

default_value

valid_min

valid_max

valid_options

automl_enabled

num_gpus

int

1

FALSE

gpu_ids

list

[0]

FALSE

num_nodes

int

1

FALSE

checkpoint

string

???

FALSE

results_dir

string

FALSE

conf_threshold

float

Confidence Threshold

0.5

FALSE

show

bool

Whether to show the 3D visualizaiton on screen

False

FALSE

Training the Model#

To train a BEVFusion model, use this command:

tao model bevfusion train [-h] -e <experiment_spec>
                          [-r <results_dir>]

Required Arguments#

  • -e, --experiment_spec: The experiment specification file to set up the training experiment

Optional Arguments#

  • -r, --results_dir: The path to the folder where the experiment outputs should be written. If this argument is not specified, the results_dir from the spec file is used.

  • --gpus: The number of GPUs used to run training

  • --num_nodes: The number of nodes used to run training. If this value is larger than 1, distributed multi-node training is enabled.

  • -h, --help: Show this help message and exit.

Sample Usage#

Here’s an example of the train command:

tao bevfusion model train -e /path/to/spec.yaml

Evaluating the Model#

To run evaluation with a BEVFusion model, use this command:

tao model bevfusion evaluate [-h] -e <experiment_spec>
                             [-r <results_dir>]

Required Arguments#

  • -e, --experiment_spec: The experiment spec file to set up the evaluation experiment

Optional Arguments#

  • -r, --results_dir: The directory where the evaluation result is stored

Sample Usage#

Here’s an example of using the evaluate command:

tao model bevfusion evaluate -e /path/to/spec.yaml -r /path/to/results/ evaluate.checkpoint=/path/to/model.pth

Running Inference with BEVFusion Model#

Use the following command to run inference on BEVFusion with .pth:

tao model bevfusion inference [-h] -e <experiment spec file>
                              [-r <results_dir>]

Required Arguments#

  • -e, --experiment_spec: The experiment spec file to set up the inference experiment

Optional Arguments#

  • -r, --results_dir: The directory where the inference result is stored

Sample Usage#

Here’s an example of using the inference command:

tao model bevfusion inference -e /path/to/spec.yaml -r /path/to/results/ inference.checkpoint=/path/to/model.pth