OneFormer#
OneFormer supports the following tasks:
Train
Evaluate
Inference
Export
The following sections explain each task in detail.
Note
Throughout this documentation are references to
$EXPERIMENT_IDand$DATASET_IDin the FTMS Client sections.For instructions on creating a dataset using the remote client, refer to the Creating a dataset section in the Remote Client documentation.
For instructions on creating an experiment using the remote client, refer to the Creating an experiment section in the Remote Client documentation.
The spec format is YAML for TAO Launcher, and JSON for FTMS Client.
File-related parameters, such as dataset paths or pretrained model paths, are required only for TAO Launcher, not for FTMS Client.
Dataset Format#
OneFormer supports three types of dataloaders, corresponding to the semantic, panoptic and instance segmentation tasks.
Each dataloader requires a certain annotation format.
For the semantic segmentation task, each line of the JSONL annotation file encodes the locations of the raw image and the mask ground truth.
For the panoptic and instance segmentation tasks, the annotation formats respectively follow the COCO panoptic and COCO format.
Note
The category IDs and annotation IDs must be greater than 0.
Creating a Configuration File#
SPECS=$(tao-client oneformer get-spec --action train --job_type experiment --id $EXPERIMENT_ID)
See also
For information on how to create an experiment using the remote client, refer to the Creating an experiment section in the Remote Client documentation.
Below is a sample OneFormer spec file. It has six components --:code:`model, :code:`inference`,
:code:`evaluate`, :code:`dataset`, :code:`export`, and :code:`train`, as well as several global parameters,
described below. The spec file is coded in YAML file format.
Here’s a sample of the OneFormer spec file:
results_dir: nvidia_tao_pytorch/cv/oneformer/checkpoints/coco/swin
dataset:
train:
images: /workspace/datasets/coco/train2017
annotations: /workspace/datasets/coco/annotations/panoptic_train2017.json
panoptic: /workspace/datasets/coco/panoptic_train2017
batch_size: 4
num_workers: 4
val:
images: /workspace/datasets/coco/val2017
annotations: /workspace/datasets/coco/annotations/panoptic_val2017.json
panoptic: /workspace/datasets/coco/panoptic_val2017
batch_size: 4
num_workers: 4
test:
images: /workspace/datasets/coco/val2017
annotations: /workspace/datasets/coco/annotations/panoptic_val2017.json
panoptic: /workspace/datasets/coco/panoptic_val2017
batch_size: 4
num_workers: 4
image_size: 1024
label_map: /workspace/datasets/coco/label_map.json
cutmix_prob: 0.0
model:
backbone:
name: D2SwinTransformer
freeze_at: 0
swin:
embed_dim: 192
depths: [2, 2, 18, 2]
num_heads: [6, 12, 24, 48]
window_size: 12
mlp_ratio: 4.0
patch_size: 4
patch_norm: true
ape: false
pretrain_img_size: 384
qkv_bias: true
qk_scale: null
attn_drop_rate: 0.0
drop_rate: 0.0
drop_path_rate: 0.3
out_features: [res2, res3, res4, res5]
out_indices: [0, 1, 2, 3]
use_checkpoint: false
one_former:
num_object_queries: 150
sem_seg_head:
num_classes: 133
test:
test_topk_per_image: 100
object_mask_threshold: 0.8
train:
num_epochs: 50
num_gpus: 8
num_nodes: 4
pretrained_model: nvidia_tao_pytorch/cv/oneformer/checkpoints/coco/swin_base/train/model_epoch_006_step_25879.pth
pretrained_backbone:
precision: 32
iters_per_epoch: 15000
evaluate:
checkpoint: nvidia_tao_pytorch/cv/oneformer/checkpoints/coco/swin/train/model_epoch_001_step_01850.pth
num_gpus: 1
gpu_ids: [0]
results_dir: nvidia_tao_pytorch/cv/oneformer/checkpoints/coco/swin/eval
inference:
mode: semantic
results_dir: nvidia_tao_pytorch/cv/oneformer/checkpoints/coco/swin/inference
images_dir: /workspace/datasets/coco/val2017
image_size: [1024, 1024]
checkpoint: nvidia_tao_pytorch/cv/oneformer/checkpoints/coco/swin/train/model_epoch_001_step_01850.pth
Parameter |
Data Type |
Default |
Description |
Supported Values |
|---|---|---|---|---|
model |
dict config |
– |
Configuration of the model architecture |
|
dataset |
dict config |
– |
Configuration of the dataset |
|
train |
dict config |
– |
Configuration of the training task |
|
evaluate |
dict config |
– |
Configuration of the evaluation task |
|
inference |
dict config |
– |
Configuration of the inference task |
|
encryption_key |
string |
None |
Encryption key to encrypt and decrypt model files |
|
results_dir |
string |
/results |
Directory where experiment results are saved |
|
export |
dict config |
– |
Configuration of the ONNX export task |
Model Config#
The model configuration (model) defines the oneformer model structure. Thw model
is used for training, evaluation, and inference. The table below provides a detailed description
of the model structure. Currently, oneformer only supports Swin Transformers and EfficientViT (experimental feature) models.
Field |
Description |
Data Type and Constraints |
Supported Value |
|---|---|---|---|
backbone |
Backbone configuration |
dict |
|
one_former |
Configuration for the oneformer architecture |
dict |
|
sem_seg_head |
Configuration for the segmentation head |
dict |
|
text_encoder |
Configuration for the text encoder |
dict |
|
mode |
Postprocesing mode |
string |
|
object_mask_threshold |
Classification confidence threshold |
float |
0.4 |
overlap_threshold |
Overlap threshold for panoptic inference |
float |
0.8 |
test_topk_per_image |
Keep topk instances per image for instance inference |
unsigned int |
100 |
Backbone Configuration#
The backbone configuration (backbone) defines the backbone structure. The table below provides a
detailed description. OneFormer currently supports only Swin Transformers and EfficientViT models.
Field |
Description |
Data Type and Constraints |
Recommended/Typical Value |
|---|---|---|---|
type |
Backbone type |
str |
|
pretrained_weights |
Path to the pretrained backbone model |
str |
|
swin |
Configuration for the Swin backbones |
dict |
Swin Configuration#
The swin configuration (swin) specifies the key parameters in a Swin Transformer backbone.
Field |
Description |
Data Type and Constraints |
Recommended/Typical Value |
|---|---|---|---|
embed_dim |
Dimension of the embedding |
unsigned int |
192 |
depths |
Number of layers in each stage |
list |
[2, 2, 18, 2] |
num_heads |
Number of attention heads in each stage |
list |
[6, 12, 24, 48] |
window_size |
Size of the window for local attention |
unsigned int |
12 |
mlp_ratio |
Ratio of the MLP hidden dimension to the embedding dimension |
float |
4.0 |
patch_size |
Size of the patch for the patch embedding |
unsigned int |
4 |
patch_norm |
Whether to normalize the patch embedding |
bool |
True |
ape |
Whether to use absolute positional encoding |
bool |
False |
qkv_bias |
Whether to use bias in the QKV projection |
bool |
True |
qk_scale |
Scale factor for the QK projection |
float |
None |
attn_drop_rate |
Dropout rate for the attention |
float |
0.0 |
drop_rate |
Dropout rate for the MLP |
float |
0.0 |
drop_path_rate |
Drop path rate for the MLP |
float |
0.3 |
out_features |
Names of the extracted feature maps |
list |
["res2", "res3", "res4", "res5"] |
out_indices |
Stages to extract feature maps |
list |
[0, 1, 2, 3] |
use_checkpoint |
Whether to use checkpoint for the transformer |
bool |
False |
pretrain_img_size |
Image size used in pretraining |
unsigned int |
384 |
Data Config#
The data configuration (data) defines the data source, augmentation methods, and preprocessing hyperparameters.
Field |
Description |
Data Type and Constraints |
Recommended/Typical Value |
|---|---|---|---|
pixel_mean |
Image mean in RGB order |
list |
[123.675, 116.28, 103.53] |
pixel_std |
Image standard deviation in RGB order |
list |
[58.395, 57.12, 57.375] |
augmentation |
Augmentation settings |
dict |
|
contiguous_id |
Whether to use contiguous IDs |
bool |
|
label_map |
Path of the label mapping file |
string |
|
workers |
Number of workers to load data for each GPU |
unsigned int |
|
train |
Train dataset configuration |
dict |
|
val |
Validation dataset configuration |
dict |
|
test |
Test dataset configuration |
dict |
Augmentation Config#
The augmentation configuration (augmentation) defines the augmentation methods.
Parameter |
Datatype |
Description |
Supported Values |
|---|---|---|---|
train_min_size |
int list |
List of sizes to perform random resize for training data |
int list |
train_max_size |
unsigned int |
Minimum random crop size for training data |
>0 |
train_crop_size |
int list |
Random crop size for training data in [H, W] |
int list |
test_min_size |
unsigned int |
Minimum resize size for test data |
>0 |
test_max_size |
unsigned int |
Maximum resize size for test data |
>0 |
Dataset Configuration#
The dataset configuration (dataset) defines the dataset directories, annotation file and batch size for train, val, or test.
Parameter |
Datatype |
Description |
|---|---|---|
images |
str |
Path of the image directory |
annotations |
str |
Path of the annotation file |
panoptic |
str |
Path of the panoptic directory |
batch_size |
unsigned int |
Batch size |
num_workers |
unsigned int |
Number of workers to process the input data |
Train Configuration#
The train configuration defines the hyperparameters of the training process.
train:
precision: "fp16"
num_gpus: 1
checkpoint_interval: 10
validation_interval: 10
num_epochs: 50
optim:
type: "AdamW"
lr: 0.0001
weight_decay: 0.05
Parameter |
Datatype |
Default |
Description |
Supported Values |
||
|---|---|---|---|---|---|---|
num_gpus |
unsigned int |
1 |
Number of GPUs to use for distributed training. |
>0 |
||
gpu_ids |
list[int] |
[0] |
Indices of GPUs to use for distributed training. |
|||
seed |
unsigned int |
1234 |
Random seed for random, NumPy, and torch. |
>0 |
||
num_epochs |
unsigned int |
10 |
Total number of epochs to run the experiment. |
>0 |
||
checkpoint_interval |
unsigned int |
1 |
Epoch interval at which checkpoints are saved. |
>0 |
||
validation_interval |
unsigned int |
1 |
Epoch interval at which validation is run. |
>0 |
||
resume_training_checkpoint_path |
string |
Intermediate PyTorch Lightning checkpoint from which to resume training. |
||||
results_dir |
string |
/results/train |
Directory to save training results. |
|||
optim |
dict config |
Configuration for the optimizer, including the learning rate, learning scheduler, and weight decay. |
>0 |
|||
clip_grad_type |
str |
full |
Type of gradient clip method. |
|||
clip_grad_norm |
float |
0.1 |
Amount to clip the gradient by the L2 norm. A value of 0.0 specifies no clipping. |
>=0 |
||
precision |
string |
fp32 |
“fp16” enables precision training; this can help save GPU memory. |
fp32, fp16 |
||
distributed_strategy |
string |
ddp |
Multi-GPU training strategy. Supported values are |
ddp, ddp_sharded |
||
activation_checkpoint |
bool |
True |
Whether to recompute in backward pass to save GPU memory, rather than storing activations. |
True, False |
||
pretrained_model_path |
string |
Path of pretrained model checkpoint path to load for finetuning. |
||||
num_nodes |
unsigned int |
1 |
Number of nodes. If greater than 1, multi-node is enabled. |
>0 |
||
freeze |
string list |
[] |
List of layer names in the model to freeze; for example, |
|||
verbose |
bool |
False |
Whether to print detailed learning rate scaling from the optimizer. |
True, False |
||
iters_per_epoch |
unsigned int |
Number of samples per epoch. |
||||
Optimizer Configuration#
The optim parameter defines the config for the optimizer in training, including the
learning rate, learning scheduler, and weight decay.
Parameter |
Datatype |
Default |
Description |
Supported Values |
|---|---|---|---|---|
lr |
float |
2e-4 |
Initial learning rate for training the model, excluding the backbone |
>0.0 |
momentum |
float |
0.9 |
Momentum for the AdamW optimizer |
>0.0 |
weight_decay |
float |
1e-4 |
Weight decay coefficient |
>0.0 |
lr_scheduler |
string |
MultiStep |
Learning scheduler:
|
MultiStep, StepLR |
gamma |
float |
0.1 |
decreasing factor for the learning rate scheduler |
>0.0 |
milestones |
int list |
[11] |
steps to decrease the learning rate for the |
int list |
monitor_name |
string |
val_loss |
monitor value for the |
val_loss, train_loss |
type |
string |
AdamW |
type of optimizer to use during training |
AdamW, SGD |
Evaluation Configuration#
The evaluate parameter defines the hyperparameters of the evaluation process.
evaluate:
checkpoint: /path/to/model.pth
num_gpus: 1
Parameter |
Datatype |
Default |
Description |
Supported Values |
|---|---|---|---|---|
checkpoint |
string |
Path to the PyTorch model to evaluate. |
||
trt_engine |
string |
Path to the TensorRT model to evaluate. Must be used only with |
||
num_gpus |
unsigned int |
1 |
Number of GPUs to use. |
>0 |
gpu_ids |
unsigned int |
[0] |
GPU IDs to use. |
|
results_dir |
string |
/results/evaluate |
Path of the evaluation results directory |
Inference Configuration#
The inference parameter defines the hyperparameters of the inference process.
inference:
checkpoint: /path/to/model.pth
num_gpus: 1
Parameter |
Datatype |
Default |
Description |
Supported Values |
|---|---|---|---|---|
checkpoint |
string |
Path to the PyTorch model to inference. |
||
trt_engine |
string |
Path to the TensorRT model to inference. Must be used only with |
||
num_gpus |
unsigned int |
1 |
Number of GPUs to use. |
>0 |
gpu_ids |
unsigned int |
[0] |
GPU IDs to use. |
|
results_dir |
string |
/results/inference |
Path of the inference results directory. |
Export Configuration#
The export parameter defines the hyperparameters of the export process.
export:
checkpoint: /path/to/model.pth
onnx_file: /path/to/model.onnx
on_cpu: False
opset_version: 12
input_channel: 3
input_width: 960
input_height: 544
batch_size: -1
Parameter |
Datatype |
Default |
Description |
Supported Values |
|---|---|---|---|---|
checkpoint |
string |
Path to the PyTorch model to export. |
||
onnx_file |
string |
Path to the |
||
on_cpu |
bool |
True |
If |
True, False |
opset_version |
unsigned int |
12 |
Opset version of the exported ONNX. |
>0 |
input_channel |
unsigned int |
3 |
Input channel size. The only supported value is 3. |
3 |
input_width |
unsigned int |
960 |
Input width. |
>0 |
input_height |
unsigned int |
544 |
Input height. |
>0 |
batch_size |
unsigned int |
-1 |
Batch size of the ONNX model. If -1, export uses a dynamic batch size. |
>=-1 |
Training the Model#
To train a OneFormer model, use this command:
TRAIN_JOB_ID=$(tao-client oneformer experiment-run-action --action train --id $EXPERIMENT_ID --specs "$SPECS")
See also
For information on how to create an experiment using the remote client, refer to the Creating an experiment section in the Remote Client documentation.
tao model oneformer train [-h] -e <experiment_spec>
[results_dir=<global_results_dir>]
[model.<model_option>=<model_option_value>]
[dataset.<dataset_option>=<dataset_option_value>]
[train.<train_option>=<train_option_value>]
[train.gpu_ids=<gpu indices>]
[train.num_gpus=<number of gpus>]
Required Arguments
-e, --experiment_spec: The experiment specification file to set up the training experiment.
Optional Arguments
Optional arguments override option values in the experiment spec file.
-h, --help: Show this help message and exit.model.<model_option>: The model options.dataset.<dataset_option>: The dataset options.train.<train_option>: The train options.train.optim.<optim_option>: The optimizer options
Note
For training, evaluation, and inference, we expose two variables for each task: num_gpus and gpu_ids, which
default to 1 and [0], respectively. If both are passed, but are inconsistent, for example num_gpus = 1,
gpu_ids = [0, 1], then they are modified to follow the setting that implies more GPUs; in the same example num_gpus is modified from 1 to 2.
In some cases multi-GPU training may result in a segmentation fault. You can circumvent this by
setting the enviroment variable OMP_NUM_THREADS to 1. Depending upon your model of execution, you may use the following methods to set
this variable:
CLI Launcher:
You may set the environment variable by adding the following fields to the
Envsfield of your~/.tao_mounts.jsonfile as mentioned in bullet 3 in ths section Running the launcher.{ "Envs": [ { "variable": "OMP_NUM_THREADSR", "value": "1" } }
Docker:
You may set environment variables in Docker by setting the
-eflag in the Docker command line.docker run -it --rm --gpus all \ -e OMP_NUM_THREADS=1 \ -v /path/to/local/mount:/path/to/docker/mount nvcr.io/nvidia/tao/tao-toolkit:5.5.0-pyt <model> train -e
Checkpointing and Resuming Training
At every train.checkpoint_interval, a PyTorch Lightning checkpoint is saved. It is called model_epoch_<epoch_num>.pth.
Checkpoints are saved in train.results_dir, like this:
$ ls /results/train
'model_epoch_000.pth'
'model_epoch_001.pth'
'model_epoch_002.pth'
'model_epoch_003.pth'
'model_epoch_004.pth'
The latest checkpoint is also saved as oneformer_model_latest.pth.
Training automatically resumes from oneformer_model_latest.pth if it exists in train.results_dir.
oneformer_model_latest.pth is superseded by train.resume_training_checkpoint_path if it is provided.
The major implication of this logic is that, if you want to trigger fresh training from scratch, you must either:
Specify a new, empty results directory (recommended), or
Remove the latest checkpoint from the results directory.
Optimizing Resources for Training OneFormer#
Training OneFormer requires powerful GPUs (for example, V100 or A100) with at least 15 GB of VRAM and a lot of CPU memory to be trained on a standard dataset like COCO. This section outlines some of the strategies you can use to launch training with only limited resources.
Optimize GPU Memory#
There are various ways to optimize GPU memory usage. A common approach is to reduce dataset.batch_size. However, this can cause your training to take longer than usual.
We recommend setting the following configurations to optimize GPU consumption:
Set
train.precisiontofp16to enable automatic mixed precision training. This can reduce your GPU memory usage by 50%.Set
train.activation_checkpointtoTrueto enable activation checkpointing. Memory usage can be improved by recomputing the activations instead of caching them in memory.Set
train.distributed_strategytoddp_shardedto enable Sharded DDP training. This shares gradient calculation across different processes to help reduce GPU memory.Try using lighter-weight backbones, or freeze the backbone by setting
train.freeze.Try changing the augmentation resolution in
dataset.augmentation, depending on your dataset.
Optimize CPU Memory#
To speed up data loading, it is a common practice to use many workers to spawn multiple processes. However, this can cause an Out of Memory condition if annotation file is very large. We recommend setting the following configurations to optimize CPU consumption:
Set
dataset.dataset_typetoserializedso that the COCO-based annotation data can be shared across different subprocesses.Set
dataset.augmentation.fixed_paddingtoTrueso that images are padded before the batch formulation. Due to random resize and random crop augmentation during training, the resulting image resolution after transform can vary across images. Such variable image resolutions can cause memory leaks, causing an Out of Memory condition in the middle of training. This is the limitation of PyTorch, so we advise settingfixed_paddingtoTrueto help stabilize CPU memory usage.
Evaluating the Model#
To run evaluation with a OneFormer model, use this command:
EVAL_JOB_ID=$(tao-client oneformer experiment-run-action --action evaluate --id $EXPERIMENT_ID --parent_job_id $TRAIN_JOB_ID --specs "$SPECS")
See also
For information on how to create an experiment using the remote client, refer to the Creating an experiment section in the Remote Client documentation.
tao model oneformer evaluate [-h] -e <experiment_spec>
evaluate.checkpoint=<model to be evaluated>
[evaluate.<evaluate_option>=<evaluate_option_value>]
[evaluate.gpu_ids=<gpu indices>]
[evaluate.num_gpus=<number of gpus>]
Required Arguments
-e, --experiment_spec: The experiment spec file to set up the evaluation experimentevaluate.checkpoint: The.pthmodel to be evaluated
Optional Arguments
evaluate.<evaluate_option>: The evaluate options.
Running Inference with the oneformer Model#
The inference tool for oneformer models can be used to visualize bounding boxes and masks.
INFERENCE_JOB_ID=$(tao-client oneformer experiment-run-action --action inference --id $EXPERIMENT_ID --parent_job_id $TRAIN_JOB_ID --specs "$SPECS")
See also
For information on how to create an experiment using the remote client, refer to the Creating an experiment section in the Remote Client documentation.
tao model oneformer inference [-h] -e <experiment spec file>
inference.checkpoint=<inference model>
[inference.<evaluate_option>=<evaluate_option_value>]
[inference.gpu_ids=<gpu indices>]
[inference.num_gpus=<number of gpus>]
Required Arguments
-e, --experiment_spec: The experiment spec file to set up the inference experimentinference.checkpoint: The.pthmodel to run inference on
Optional Arguments
inference.<inference_option>: The inference options
Exporting the Model#
EXPORT_JOB_ID=$(tao-client oneformer experiment-run-action --action export --id $EXPERIMENT_ID --parent_job_id $TRAIN_JOB_ID --specs "$SPECS")
See also
For information on how to create an experiment using the remote client, refer to the Creating an experiment section in the Remote Client documentation.
tao model oneformer export [-h] -e <experiment spec file>
[results_dir=<results_dir>]
export.checkpoint=<model to export>
export.onnx_file=<onnx path>
Required Arguments
The following arguments are required to run the command.
-e, --experiment_spec: The path to an experiment spec fileexport.checkpoint: The.pthmodel to export.export.onnx_file: The path where the.etltor.onnxmodel is saved.
Optional Arguments
The following arguments are optional to run the command.
export.<export_option>: The export options.
TensorRT Engine Generation and Validation#
For deployment, refer to :ref:`TAO Deploy documentation for oneformer <oneformer_with_tao_deploy>`.