Image Classification PyT
Image Classification PyT is a PyTorch-based image-classification model included in the TAO Toolkit. It supports the following tasks:
train
evaluate
inference
export
These tasks can be invoked from the TAO Toolkit Launcher using the following convention on the command-line:
tao model classification_pyt <sub_task> <args_per_subtask>
Where, args_per_subtask
are the command-line arguments required for a given subtask. Each
subtask is explained in detail in the following sections.
Image Classification (PyT) is based off of MMClassification. Hence, most parameters are adopted from the MMClassification 0.x format. This version has been deprecated by MMLab and moved to MMPretrain. TAO Toolkit will be updated to the MMPretrain version in a future release.
See the Data Annotation Format page for more information about the data format for image classification.
The train
classification experiment specification consists of three main components:
dataset
train
model
Here is an example of dataset specification file for classification PyT with a FAN backbone:
dataset:
data:
samples_per_gpu: 128
workers_per_gpu: 8
train:
data_prefix: "/raid/ImageNet2012/ImageNet2012/train"
pipeline: # Augmentations alone
- type: RandomResizedCrop
size: 224
backend: "pillow"
- type: RandomFlip
flip_prob: 0.5
direction: "horizontal"
- type: ColorJitter
brightness: 0.4
contrast: 0.4
saturation: 0.4
- type: RandomErasing
erase_prob: 0.3
val:
data_prefix: /raid/ImageNet2012/ImageNet2012/val
test:
data_prefix: /raid/ImageNet2012/ImageNet2012/val
The table below describes the configurable parameters in dataset
.
Parameter | Datatype | Default | Description | Supported Values |
sampler |
dict config | None | The dataset sampler type | |
|
dict config |
None |
Contains the following configurable parameters: |
> 0 |
data |
dict config | None | Parameters related to training. Refer to data for more details. |
data
Parameter | Datatype | Default | Description | Supported Values |
samples_per_gpu |
int | None | The dataset sampler type | |
|
str |
Dict |
Contains the following configurable parameters: |
> 0 |
|
dict config str |
None |
Contains the training dataset configuration: |
Imagenet Classes |
|
dict config str |
None |
Contains the test dataset configuration: |
Imagenet Classes |
|
dict config str |
None |
Contains the validation dataset configuration: |
Imagenet Classes |
Refer to the MMClassification 0.x format documentation for more details.
pipeline
The following is an example pipeline config with different augmentations:
pipeline: # Augmentations alone
- type: RandomResizedCrop
size: 224
backend: "pillow"
- type: RandomFlip
flip_prob: 0.5
direction: "horizontal"
- type: ColorJitter
brightness: 0.4
contrast: 0.4
saturation: 0.4
- type: RandomErasing
erase_prob: 0.3
Some of the widely adopted augmentations and the parameters are listed below. More information, refer to the MMClassification documentation for transforms
Parameter | Datatype | Default | Description | Supported Values |
|
dict config |
None |
Contains the following configurable parameters: |
> 0 - |
|
dict config |
None |
Contains the following configurable parameters: |
0-1 |
|
dict config
int |
None |
Contains the following configurable parameters:
* |
> 0 |
|
dict config |
None |
The ColorJitter augmentation contains the following parameters: |
0-1 |
|
dict config |
None |
The RandomErasing augmentation contains the following parameters:
|
0-1 |
Here is an example of a train specification file for Image Classification PyT:
train:
train_config:
runner:
max_epochs: 300
checkpoint_config:
interval: 1
logging:
interval: 5000
validate: True
evaluation:
interval: 10
custom_hooks:
- type: "EMAHook"
momentum: 0.00008
priority: "ABOVE_NORMAL"
lr_config:
policy: CosineAnnealingCooldown
min_lr: 5.0e-06
cool_down_time: 10
warmup: 'linear'
warmup_iters: 20
warmup_by_epoch: True
optimizer:
type: AdamW
lr: 0.005
weight_decay: 0.05
The table below describes the configurable parameters in the train
specification.
Parameter | Datatype | Default | Description | Supported Values |
|
dict config |
None |
Contains the following configurable parameters: |
> 0 |
train_config |
dict config | None | Parameters related to training. For more information, refer to train_config | |
results_dir |
str | None | The path for saving the checkpoint and logs | str |
train_config
Parameter | Datatype | Default | Description | Supported Values |
|
dict config |
None |
Contains the following configurable parameters: |
|
|
dict config |
None |
Contains the following configurable parameters:
|
>0 |
|
dict config |
None |
Contains the following configurable parameters:
|
>0 |
|
dict config |
None |
Contains the configurable parameters for different optimizers, as detailed in |
|
|
dict config |
None |
Contains the following parameters: |
>=0.0 |
|
dict config |
None |
Contains the following configurable parameters: |
|
validate |
bool | False | A flag that enables validation during training | |
find_unused_parameters |
bool | False | Sets this parameter in DDP. For more information, refer DDP_PyT. | |
lr_config |
dict | None | The learning-rate scheduler configuration. For more details, refer to lr_config | |
load_from |
str | None | The checkpoint path from where the end-end model weights including head can be loaded | |
custom_hooks |
dict | None | The custom training hooks configuration. For more details, refer to custom_hooks. | |
resume_training_checkpoint_path |
str | None | The checkpoint path to resume the training from |
optimizer
The following optimizers are supported:
SGD
Parameter | Datatype | Default | Description | Supported Values |
|
dict config |
None |
Contains the following configurable parameters: |
|
AdamW
Parameter | Datatype | Default | Description | Supported Values |
|
dict config |
None |
Contains the following configurable parameters: |
|
lr_config
The lr_config
parameter defines the parameters for the learning-rate scheduler. The following learning-rate schedulers are supported:
CosineAnnealingCooldown
Parameter | Datatype | Default | Description | Supported Values |
min_lr |
float | None | The minimum learning rate after annealing. The default value is None. | >=0.0 |
min_lr_ratio |
float | None | The minimum learning ratio after annealing | Less than 1.0 |
cool_down_ratio |
float | 0.1 | The cooldown ratio | In the interval (0, 1). |
cool_down_time |
int | 10 | The cooldown time | In the interval (0, 1). |
warmup |
string | exp | The type of warmup used | constant, linear, exp |
warmup_iters |
int | 0 | The number of iterations or epochs that warmup lasts | >=0.0 |
warmup_ratio |
float | 0.1 | The learning rate used at the beginning of warmup equals warmup_ratio * initial_lr . |
In the interval (0, 1). |
CosineAnnealing
Parameter | Datatype | Typical value | Description | Supported Values |
warmup |
string | exp | Type of warmup used. | constant, linear, exp |
warmup_iters |
int | 0 | The number of iterations or epochs that warmup lasts | >=0.0 |
warmup_ratio |
float | 0.1 | The learning rate used at the beginning of warmup equals warmup_ratio * lr . |
In the interval (0, 1). |
min_lr_ratio |
float | None | The minimum learning ratio after annealing | Less than 1.0 |
Step
Parameter | Datatype | Typical value | Description | Supported Values |
gamma |
float | – | The base (maximum) learning rate | Usually less than 1.0 |
step |
float | – | The ratio of the minimum learning rate to the base learning rate | Less than 1.0 |
Poly
Parameter | Datatype | Typical value | Description | Supported Values |
min_lr |
float | – | The base (maximum) learning rate | Usually less than 1.0 |
power |
float | – | The ratio of the minimum learning rate to the base learning rate. | Less than 1.0 |
soft_start |
float | – | The progress at which the learning rate achieves the base learning rate | In the interval (0, 1). |
custom_hooks
The following is an example of how a custom hook from the MMCls to Hydra config is provided for EMAHook`
:
MMClassification config:
custom_hooks = [ dict(type='EMAHook', interval=100, priority='HIGH')]
Equivalent TAO Hydra config:
custom_hooks: - type: "EMAHook" momentum: 0.00008 priority: "ABOVE_NORMAL"
For more detail on custom_hooks
, refer to the MMClassification documentation for custom hooks.
Here is an example model-specification file for Image Classification PyT with a FAN backbone:
model:
backbone:
type: "fan_tiny_8_p4_hybrid"
custom_args:
drop_path_rate: 0.1
freeze: False
pretrained: <Path to pretrained weights>
head:
type: "FANLinearClsHead"
num_classes: 1000
custom_args:
head_init_scale: 1
loss:
type: LabelSmoothLoss
label_smooth_val: 0.1
mode: 'original'
train_cfg:
augments:
- type: BatchMixup
alpha: 0.8
num_classes: 1000
prob: 0.5
- type: BatchCutMix
alpha: 1.0
num_classes: 1000
prob: 0.5
The model
parameter primarily configures the backbone and head.
Parameter | Datatype | Default | Description | Supported Values |
|
Dict |
None |
The |
|
|
Dict |
None |
Contains the following configurable parameters
*
*
Refer to the Foundation Models section for Foundation Models |
FAN Variants
GCViT Variants
FasterViT Variants False/True |
head |
Dict | None | The config parameters for the classification head | – |
train_cfg |
Dict | None | Contains advanced augmentation parameters. | – |
Foundation Models
model:
backbone:
type: "ViT-B-32"
custom_args:
drop_path_rate: 0.1
freeze: False
pretrained: laion400m_e31
head:
type: LinearClsHead
num_classes: 1000
in_channels: 512
loss:
type: CrossEntropyLoss
loss_weight: 1.0
use_soft: False
topk: [1, 5]
Subset of the supported arch and the pre-train datasets. Please note that the in_channels
should be updated under the head :
CLIP Image Backbones:
Arch | Pretrained Dataset | in_channels |
|
* laion400m_e31 |
512 |
ViT-B-16 |
laion400m_e31 | 512 |
ViT-L-14 |
laion400m_e31 | 768 |
ViT-H-14 |
laion2b_s32b_b79k | 1024 |
ViT-g-14 |
laion2b_s12b_b42k | 1024 |
EVA - CLIP Image Backbones:
Arch | Pretrained Dataset | in_channels |
EVA02-L-14 |
merged2b_s4b_b131k | 768 |
EVA02-L-14-336 |
laion400m_e31 | 768 |
EVA02-E-14 |
laion400m_e31 | 1024 |
EVA02-E-14-plus |
laion2b_s32b_b79k | 1024 |
head
Parameter | Datatype | Default | Description | Supported Values |
type |
string | None | Parameters for Beta distribution to generate the mixing ratio | LinearClsHead, FANLinearClsHead |
num_classes |
Dict | None | The number of training classes | >=0 |
loss |
Dict | {“type”:”CrossEntropyLoss”} | Refer to losses for different types of loss and their parameters | |
topk |
List | [1,] | The number of classes | >=0 |
|
Dict |
None |
Any custom parameters to be passed to |
– |
train_cfg
BatchMixup
Parameter | Datatype | Default | Description | Supported Values |
alpha |
string | None | Parameters for Beta distribution to generate the mixing ratio | 0-1 |
prob |
Dict | None | The probability at which to apply augmentation | 0-1 |
num_classes |
int | None | The number of classes | >=0 |
BatchCutMix
Parameter | Datatype | Default | Description | Supported Values |
alpha |
string | None | Parameters for Beta distribution to generate the mixing ratio | 0-1 |
prob |
Dict | None | The probability at which to apply the augmentation | 0-1 |
num_classes |
int | None | Number of classes | >=0 |
loss
Some Important Losses for classification losses are shown below. Please note that all supported losses in MMCls can be used by following the Hydra config for TAO Toolkit. For a list of MMCls losses, refer to the losses_mmcls documentation.
LabelSmoothLoss
Parameter | Datatype | Default | Description | Supported Values |
label_smooth_val |
string | None | The degree of label smoothing | 0-1 |
use_sigmoid |
bool | None | Specifies whether prediction should use the sigmoid of softmax | False/ True |
num_classes |
int | None | The number of classes | >=0 |
mode |
string | None | Parameters for Beta distribution to generate the mixing ratio | 0-1 |
reduction |
str | None | The method used to reduce the loss | mean, sum |
loss_weight |
float | 1.0 | The weight of the loss | >=0 |
CrossEntropyLoss
Parameter | Datatype | Default | Description | Supported Values |
use_sigmoid |
bool | False | Specifies whether prediction should use the sigmoid of softmax | 0-1 |
use_soft |
bool | False | Specifies whether to use the soft version of CrossEntropyLoss | 0-1 |
loss_weight |
float | 1.0 | The weight of the loss | 0-1 |
Use the tao model classification_pyt train
command to train a classification pytorch model:
tao model classification_pyt train [-h] -e <spec file>
-r <result directory>
[-g <num GPUs>]
Required Arguments
-r, --results_dir
: The path to a folder where the experiment outputs should be written-e, --experiment_spec_file
: The path to the experiment spec file
Optional Arguments
-g, --gpus
: The nubmer of GPUs to use for training. The default value is 1.-h, --help
: Print the help message.
Sample Usage
Here’s an example of using the tao model classification_pyt train
command:
tao model classification_pyt train -e /workspace/cats_dogs/spec/train_cats_dogs.yaml -r /workspace/output
The evaluate
config defines the hyperparameters of the evaluation process. The following is an example
config:
evaluate:
checkpoint: /path/to/model.pth
topk: 1
After the model has been trained using the experiment config file and by following the steps to
train a model, the next step is to evaluate this model on a test set to measure the
accuracy of the model. The TAO toolkit includes the tao model classification_pyt evaluate
command to do this.
The classification app computes evaluation loss and Top-k accuracy.
After training, the model is stored in the output directory of your choice in
results_dir
.
evaluate:
checkpoint: /path/to/model.pth
tao model classification_pyt evaluate [-h] -e <experiment_spec_file>
evaluate.checkpoint=<model to be evaluated>
results_dir=<path to results dir>
[-g <num gpus>]
Required Arguments
-e, --experiment_spec_file
: The path to the experiment spec file
Optional Arguments
-h, --help
: Show this help message and exit.-g, --gpus
: The number of GPUs for conducting evaluation
If you followed the example in training a classification model, run the evaluation:
tao model classification_pyt evaluate -e /path/to/classification_eval.yaml
TAO will evaluate for classification and produces the Top-K accuracy metric.
For classification, tao model classification_pyt inference
saves a .csv
file containing the image paths
and the corresponding labels for multiple images. TensorRT Python inference can also be enabled.
inference:
checkpoint: /path/to/model.pth
tao model classification_pyt inference [-h] -e <experiment_spec_file>
inference.checkpoint=<model to be inferenced>
results_dir=<path to results dir>
[-g <num gpus>]
Required Arguments
-e, --experiment_spec_file
: The path to the experiment spec file
Optional Arguments
-h, --help
: Show this help message and exit.-g, --gpus
: The number of GPUs to conduct the evaluation
Exporting the model decouples the training process from inference and allows conversion to
TensorRT engines outside the TAO environment. TensorRT engines are specific to each hardware
configuration and should be generated for each unique inference environment.
The exported model may be used universally across training and deployment hardware.
The exported model format is referred to as .onnx
.
The export
parameter defines the hyperparameters of the export process.
export:
checkpoint: /path/to/model.pth
onnx_file: /path/to/model.onnx
opset_version: 12
verify: False
input_channel: 3
input_width: 224
input_height: 224
Here’s an example of the tao classification_pyt export
command:
tao model classification_pyt export [-h] -e <experiment spec file>
[-r <results_dir>]
export.checkpoint=<model to export>
export.onnx_file=<onnx path>
Required Arguments
-e, --experiment_spec
: The path to an experiment spec file
Optional Arguments
-r, --results_dir
: The directory where the inference result is storedexport.checkpoint
: The.tlt
or.pth
model to exportexport.onnx_file
: The path where the.etlt
or.onnx
model will be saved
Sample Usage
The following is a sample export command.
tao model classification_pyt export -e /path/to/spec.yaml export.checkpoint=/path/to/model.pth export.onnx_file=/path/to/model.onnx
For TensorRT engine generation, validation, and INT8 calibration, refer to the TAO Deploy documentation.
Refer to the Integrating a Classification (TF1/TF2/PyTorch) Model page for more information about deploying a classification model with DeepStream.