Image Classification PyT#
Image Classification PyT is a PyTorch-based image-classification model included in TAO. It supports the following tasks:
train
evaluate
inference
export
These tasks can be invoked from the TAO Launcher using the following convention on the command-line:
tao model classification_pyt <sub_task> <args_per_subtask>
Where, args_per_subtask
are the command-line arguments required for a given subtask. Each
subtask is explained in detail in the following sections.
TAO has been updated to the latest MMPretrain version instead of the deprecated MMClassification. .. Note:: Image Classification (PyT) is based off of MMPretrain. Hence, most parameters are adopted from the MMPretrain 1.x format.
Preparing the Input Data Structure#
See the Data Annotation Format page for more information about the data format for image classification.
The train
classification experiment specification consists of three main components:
dataset
train
model
Dataset Input for Classification PyT#
Here is an example of dataset specification file for classification PyT with a FAN backbone:
dataset:
data:
samples_per_gpu: 128
workers_per_gpu: 8
train:
data_prefix: "/raid/ImageNet2012/ImageNet2012/train"
pipeline: # Augmentations alone
- type: RandomResizedCrop
scale: 224
- type: RandomFlip
prob: 0.5
direction: "horizontal"
- type: ColorJitter
brightness: 0.4
contrast: 0.4
saturation: 0.4
- type: RandomErasing
erase_prob: 0.3
val:
data_prefix: /raid/ImageNet2012/ImageNet2012/val
test:
data_prefix: /raid/ImageNet2012/ImageNet2012/val
The table below describes the configurable parameters in dataset
.
Parameter |
Datatype |
Default |
Description |
Supported Values |
|
dict config |
None |
The dataset sampler type |
|
img_norm_cfg |
dict config
float
float
bool
|
None
[123.675, 116.28, 103.53]
[58.395, 57.12, 57.375]
False
|
Contains the following configurable parameters:
*
mean : The mean to be subtracted from image*
std : The tandard deviation to divide the image*
to_rgb : A flag specifying whether to convert to RGB format |
> 0
-
-
|
|
dict config |
None |
Parameters related to training. Refer to data for more details. |
data#
Parameter |
Datatype |
Default |
Description |
Supported Values |
|
int |
None |
The dataset sampler type |
|
img_norm_config |
str
str
bool
|
Dict
[123.675, 116.28, 103.53]
[58.395, 57.12, 57.375]
False
|
Contains the following configurable parameters:
*
mean : The mean to be subtracted from the image*
std : The standard deviation to divide the image*
to_rgb : A flag specifying whether to convert to RGB format |
> 0
-
True/ False
|
train |
dict config
str
str
Dict
str
|
None
|
Contains the training dataset configuration:
*
data_prefix : The parent folder containing folders of different classes*
ann_file : A text file where every line is an image name andcorresponding class ID. For more information, refer to the
Data Annotation Format section.
*
pipeline : The data processing pipeline, which contains thepre-processing transforms.
For more information, refer to the pipeline config
*
classes : A text file containing the classes (one class per line) |
Imagenet Classes
|
test |
dict config
str
str
Dict
str
|
None
|
Contains the test dataset configuration:
*
data_prefix : The parent folder containing folders of different classes*
ann_file : A text file where every line is an image name andcorresponding class ID. For more information, refer to the
Data Annotation Format section.
* pipeline: The data processing pipeline, which contains the
pre-processing transoforms.
For more information, refer to the pipeline config
* classes: A text file containing the classes (one class per line)
|
Imagenet Classes
|
val |
dict config
str
str
Dict
str
|
None
|
Contains the validation dataset configuration:
* data_prefix: The parent folder containing folders of different classes
* ann_file: A text file where every line is an image name and
corresponding class ID. For more information, refer to the
Data Annotation Format section.
* pipeline: The data processing pipeline, which contains the
pre-processing transoforms.
For more information, refer to pipeline config
* classes: A text file containing the classes (one class per line)
|
Imagenet Classes
|
Note
Refer to the MMPretrain 1.x format documentation for more details.
pipeline#
The following is an example pipeline config with different augmentations:
pipeline: # Augmentations alone
- type: RandomResizedCrop
scale: 224
backend: "pillow"
- type: RandomFlip
prob: 0.5
direction: "horizontal"
- type: ColorJitter
brightness: 0.4
contrast: 0.4
saturation: 0.4
- type: RandomErasing
erase_prob: 0.3
Some of the widely adopted augmentations and the parameters are listed below. More information, refer to the MMPretrain documentation for transforms
Parameter |
Datatype |
Default |
Description |
Supported Values |
RandomResizedCrop |
dict config
int
str
|
None
int
bilinear
|
Contains the following configurable parameters:
*
scale : The desired output scale of the crop*
interpolation : The interpolation method |
> 0
-
|
RandomFlip |
dict config
prob
direction
|
None
|
Contains the following configurable parameters:
*
prob : The probability at which to flip the image*
direction : The flipping direction |
0-1
horizontal,vertical
|
RandomCrop |
dict config
int/ List
int
int
str
|
None
|
Contains the following configurable parameters:
*
crop_size : The desired output scale of the crop*
padding : Optional padding on each imageborder. If a sequence oflength 4 is provided, it is used to pad left, top, right, bottom borders.
If a length of 2 is provided, it is used to pad left/right, top/bottom.
*
pad_val : The pixel pad_val value for constant fill*
padding_mode : The padding type |
> 0
> 0
> 0
> 0
> 0
constant, edge
reflect, symmetric
|
ColorJitter |
dict config
float
float
float
|
None
|
The ColorJitter augmentation contains the following parameters:
* brightness: How much to jitter brightness
* contrast: How much to jitter contrast
* saturation: How much to jitter saturation
|
0-1
0-1
0-1
|
RandomErasing |
dict config
float
float
float
str
|
None
0.5
0.02
0.4
const
|
The RandomErasing augmentation contains the following parameters:
*
erase_prob : The probability that image will be randomly erased*
min_area_ratio : The minimum erased area divided by the input image area*
max_area_ratio : The maximum erased area divided by the input image area*
mode : The fill method in the erased area:*
const : All pixels are assigned with the same value*
rand : Each pixel is assigned with a random value in [0, 255] |
0-1
0-1
0-1
const/ rand
|
train#
Here is an example of a train specification file for Image Classification PyT:
train:
train_config:
runner:
max_epochs: 300
checkpoint_config:
interval: 1
logging:
interval: 5000
validate: True
evaluation:
interval: 10
custom_hooks:
- type: "EMAHook"
momentum: 0.00008
priority: "ABOVE_NORMAL"
lr_config:
policy: CosineAnnealing
T_max: 95
by_epoch: True
begin: 5
end: 100
optimizer:
type: AdamW
lr: 0.005
weight_decay: 0.05
The table below describes the configurable parameters in the train
specification.
Parameter |
Datatype |
Default |
Description |
Supported Values |
exp_config |
dict config
int
str
int
|
None
47
“127.0.0.1”
631
|
Contains the following configurable parameters:
*
manual_seed : The random seed of experiment*
MASTER_ADDR : The host name of the Master Node*
MASTER_PORT : The port on the MASTER_ADDR |
> 0
-
-
|
|
dict config |
None |
Parameters related to training. For more information, refer to train_config |
|
|
str |
None |
The path for saving the checkpoint and logs |
str |
train_config#
Parameter |
Datatype |
Default |
Description |
Supported Values |
runner |
dict config
int
|
None
20
|
Contains the following configurable parameters:
*
max_epochs : The maximum number of epochs for which the training should be conducted |
|
checkpoint_config |
dict config
int
|
None
1
|
Contains the following configurable parameters:
*
interval : The number of steps at which the checkpoint needs to be savedNote that, currently, only Epoch Based Training is supported.
|
>0
|
logging |
dict config
int
|
None
10
|
Contains the following configurable parameters:
*
interval : The number of iterations at which the experiment logs need to be saved. The logs aresaved in the
logs directory in the output directory. |
>0
|
optimizer |
dict config
|
None
|
Contains the configurable parameters for different optimizers, as detailed in
|
|
optimizer_config |
dict config
float
|
None
None
|
Contains the following parameters:
*
max_norm : The max norm of the gradients |
>=0.0
|
evaluation |
dict config
int
|
None
int
|
Contains the following configurable parameters:
*
interval : The interval number of iterations at which validation should be performed during training |
|
|
bool |
False |
A flag that enables validation during training |
|
|
bool |
False |
Sets this parameter in DDP. For more information, refer DDP_PyT. |
|
|
dict |
None |
The learning-rate scheduler configuration. For more details, refer to lr_config |
|
|
str |
None |
The checkpoint path from where the end-end model weights including head can be loaded |
|
|
dict |
None |
The custom training hooks configuration. For more details, refer to custom_hooks. |
|
|
str |
None |
The checkpoint path to resume the training from |
optimizer#
The following optimizers are supported:
SGD
Parameter |
Datatype |
Default |
Description |
Supported Values |
optimizer |
dict config
str
float
float
float
|
None
None
None
0
0
|
Contains the following configurable parameters:
*
type : “SGD”*
lr : The learning Rate*
momentum : The momentum factor*
weight_decay : The maximum number of epochs for which the training should be conducted |
AdamW
Parameter |
Datatype |
Default |
Description |
Supported Values |
optimizer |
dict config
str
float
float
float
|
None
None
1e-3
0.0
1e-8
|
Contains the following configurable parameters:
*
type : “AdamW”*
lr : The learning rate*
weight_decay : The weight decay (L2)*
eps : A term added to the denominator to improve numerical stability |
lr_config#
The lr_config
parameter defines the parameters for the learning-rate scheduler. Some of the widely adopted learning-rate schedulers
and the parameters are listed below. More information, refer to the MMPretrain documentation for lr schedulers
CosineAnnealingLR
Parameter |
Datatype |
Default |
Description |
Supported Values |
|
int |
Maximum number of iterations. |
>=0.0 |
|
|
bool |
True |
Whether the scheduled learning rate is updated by epochs |
Less than 1.0 |
|
int |
0 |
Step at which to start updating the learning rate. |
In the interval (0, INF). |
|
int |
INF |
Step at which to stop updating the learning rate. |
In the interval (0, INF). |
MultiStepLR
Parameter |
Datatype |
Typical value |
Description |
Supported Values |
|
float |
– |
The base (maximum) learning rate |
Usually less than 1.0 |
|
bool |
True |
Whether the scheduled learning rate is updated by epochs |
Less than 1.0 |
|
int |
0 |
Step at which to start updating the learning rate. |
In the interval (0, INF). |
|
int |
INF |
Step at which to stop updating the learning rate. |
In the interval (0, INF). |
LinearLR
Parameter |
Datatype |
Typical value |
Description |
Supported Values |
|
bool |
True |
Whether the scheduled learning rate is updated by epochs |
Less than 1.0 |
|
int |
0 |
Step at which to start updating the learning rate. |
In the interval (0, INF). |
|
int |
INF |
Step at which to stop updating the learning rate. |
In the interval (0, INF). |
PolyLR
Parameter |
Datatype |
Typical value |
Description |
Supported Values |
|
float |
0 |
Minimum learning rate at the end of scheduling |
|
|
bool |
True |
Whether the scheduled learning rate is updated by epochs |
Less than 1.0 |
|
int |
0 |
Step at which to start updating the learning rate. |
In the interval (0, INF). |
|
int |
INF |
Step at which to stop updating the learning rate. |
In the interval (0, INF). |
StepLR
Parameter |
Datatype |
Typical value |
Description |
Supported Values |
|
float |
– |
The base (maximum) learning rate |
Usually less than 1.0 |
|
bool |
True |
Whether the scheduled learning rate is updated by epochs |
Less than 1.0 |
|
int |
0 |
Step at which to start updating the learning rate. |
In the interval (0, INF). |
|
int |
INF |
Step at which to stop updating the learning rate. |
In the interval (0, INF). |
custom_hooks#
The following is an example of how a custom hook from the MMPretrain to Hydra config is provided for EMAHook`
:
MMPretrain config:
custom_hooks = [ dict(type='EMAHook', interval=100, priority='HIGH')]
Equivalent TAO Hydra config:
custom_hooks: - type: "EMAHook" momentum: 0.00008 priority: "ABOVE_NORMAL"
For more detail on custom_hooks
, refer to the MMPretrain documentation for custom hooks.
model#
Here is an example model-specification file for Image Classification PyT with a FAN backbone:
model:
backbone:
type: "fan_tiny_8_p4_hybrid"
custom_args:
drop_path_rate: 0.1
freeze: False
pretrained: <Path to pretrained weights>
head:
type: "TAOLinearClsHead"
num_classes: 1000
custom_args:
head_init_scale: 1
loss:
type: LabelSmoothLoss
label_smooth_val: 0.1
mode: 'original'
train_cfg:
augments:
- type: Mixup
alpha: 0.8
- type: CutMix
alpha: 1.0
The model
parameter primarily configures the backbone and head.
Parameter |
Datatype |
Default |
Description |
Supported Values |
init_cfg |
Dict
str
str
|
None
|
The
init_cfg contains the folllowing config parameters:*
checkpoint : The path to the pre-trained model to be loaded*
prefix : The string to be removed from state_dict keys |
|
backbone |
Dict
string
|
None
|
Contains the following configurable parameters
*
type : The name of the backbone to be used*
freeze : A flag specifying whether to freeze orunfreeze the backbone
*
pretrained : The path to the pre-trained weights to beloaded
Refer to the Foundation Models section for Foundation Models
configuration.
|
FAN Variants
fan_tiny_8_p4_hybrid, fan_small_12_p4_hybrid,
fan_base_16_p4_hybrid, fan_large_16_p4_hybrid,
fan_Xlarge_16_p4_hybrid, fan_base_18_p16_224,
fan_tiny_12_p16_224, fan_small_12_p16_224,
fan_large_24_p16_224
GCViT Variants
gc_vit_xxtiny, gc_vit_xtiny, gc_vit_tiny,
gc_vit_small, gc_vit_base, gc_vit_large,
FasterViT Variants
faster_vit_0_224, faster_vit_1_224,
faster_vit_2_224, faster_vit_3_224,
faster_vit_4_224, faster_vit_5_224,
faster_vit_6_224, faster_vit_4_21k_224,
faster_vit_4_21k_384, faster_vit_4_21k_512,
faster_vit_4_21k_768
False/True
|
|
Dict |
None |
The config parameters for the classification head |
– |
|
Dict |
None |
Contains advanced augmentation parameters. |
– |
Foundation Models#
model:
backbone:
type: "ViT-B-32"
custom_args:
drop_path_rate: 0.1
freeze: False
pretrained: laion400m_e31
head:
type: LinearClsHead
num_classes: 1000
in_channels: 512
loss:
type: CrossEntropyLoss
loss_weight: 1.0
use_soft: False
topk: [1, 5]
Subset of the supported arch and the pre-train datasets. Please note that the in_channels
should be updated under the head :
CLIP Image Backbones:
Arch |
Pretrained Dataset |
in_channels |
ViT-B-32 |
* laion400m_e31
* laion400m_e32
* laion2b_e16
* laion2b_s34b_b79k
* datacomp_m_s128m_b4k
* laion2b_s34b_b79k
* laion2b_s34b_b79k
* laion2b_s34b_b79k
* openai
|
512 |
|
laion400m_e31 |
512 |
|
laion400m_e31 |
768 |
|
laion2b_s32b_b79k |
1024 |
|
laion2b_s12b_b42k |
1024 |
EVA - CLIP Image Backbones:
Arch |
Pretrained Dataset |
in_channels |
|
merged2b_s4b_b131k |
768 |
|
laion400m_e31 |
768 |
|
laion400m_e31 |
1024 |
|
laion2b_s32b_b79k |
1024 |
head#
Parameter |
Datatype |
Default |
Description |
Supported Values |
|
string |
None |
Parameters for Beta distribution to generate the mixing ratio |
TAOLinearClsHead, FANLinearClsHead, LogisticRegressionHead |
|
Dict |
None |
The number of training classes |
>=0 |
|
Dict |
{“type”:”CrossEntropyLoss”} |
Refer to losses for different types of loss and their parameters |
|
|
List |
[1,] |
The number of classes |
>=0 |
custom_args |
Dict
|
None
|
Any custom parameters to be passed to
head (e.g.``head_init_scale`` is used for
TAOLinearClsHead ) |
–
|
lr_head |
Dict
|
None
|
Parameters used for Logistic Regression Head
(e.g.``C`` is used for tuning the regularization strength)
|
–
|
lr_head#
Logistic Regression head is defined by the following parameters:
lr_head |
Dict
|
None
|
Contains the following tunable parameters:
C : Inverse of regularization strengthmax_iter : Maximum number of iterations taken for the solvers to convergeclass_weight : Whether to support balanced class trainingsolver : Algorithm to use in the optimization problem. |
–
>=0.0
>0
‘balanced’, None
‘lbfgs’, ‘liblinear’, ‘newton-cg’, ‘newton-cholesky’, ‘sag’, ‘saga’
|
Logistic Regression head involves freezing the backbone weights. Please note that the freeze
parameter should be updated to True under the backbone config as under:
model:
backbone:
type: "ViT-B-32"
custom_args:
drop_path_rate: 0.1
freeze: True
pretrained: laion400m_e31
head:
type: "LogisticRegressionHead"
num_classes: 1000
lr_head:
C: 0.316
max_iter: 5000
train_cfg#
Mixup
Parameter |
Datatype |
Default |
Description |
Supported Values |
|
string |
None |
Parameters for Beta distribution to generate the mixing ratio |
0-1 |
CutMix
Parameter |
Datatype |
Default |
Description |
Supported Values |
|
string |
None |
Parameters for Beta distribution to generate the mixing ratio |
0-1 |
|
list[float] |
None |
The min/max area ratio of the patches. |
|
|
bool |
True |
Whether to apply lambda correction when cutmix bbox clipped by image borders. |
False/True |
loss#
Some Important Losses for classification losses are shown below. Please note that all supported losses in MMPretrain can be used by following the Hydra config for TAO. For a list of MMPretrain losses, refer to the losses_mmpretrain documentation.
LabelSmoothLoss
Parameter |
Datatype |
Default |
Description |
Supported Values |
|
string |
None |
The degree of label smoothing |
0-1 |
|
bool |
None |
Specifies whether prediction should use the sigmoid of softmax |
False/ True |
|
int |
None |
The number of classes |
>=0 |
|
string |
None |
Parameters for Beta distribution to generate the mixing ratio |
0-1 |
|
str |
None |
The method used to reduce the loss |
mean, sum |
|
float |
1.0 |
The weight of the loss |
>=0 |
CrossEntropyLoss
Parameter |
Datatype |
Default |
Description |
Supported Values |
|
bool |
False |
Specifies whether prediction should use the sigmoid of softmax |
0-1 |
|
bool |
False |
Specifies whether to use the soft version of CrossEntropyLoss |
0-1 |
|
float |
1.0 |
The weight of the loss |
0-1 |
|
str |
None |
The method used to reduce the loss |
mean, sum |
|
list[float] |
None |
The weight for each class |
0-1 |
Training the model#
Use the tao model classification_pyt train
command to train a classification pytorch model:
tao model classification_pyt train [-h] -e <experiment_spec_file>
[results_dir=<global_results_dir>]
[model.<model_option>=<model_option_value>]
[dataset.<dataset_option>=<dataset_option_value>]
[train.<train_option>=<train_option_value>]
[train.gpu_ids=<gpu indices>]
[train.num_gpus=<number of gpus>]
Required Arguments#
The only required argument is the path to the experiment spec:
-e, --experiment_spec
: The experiment specification file to set up the training experiment
Optional Arguments#
You can set optional arguments to override the option values in the experiment spec file.
-h, --help
: Show this help message and exit.model.<model_option>
: The model options.dataset.<dataset_option>
: The dataset options.train.<train_option>
: The train options.train.train_config.optimizer.<optim_option>
: The optimizer options
Note
For training, evaluation, and inference, we expose 2 variables for each respective task: num_gpus
and gpu_ids
, which
default to 1
and [0]
, respectively. If both are passed, but inconsistent, for example num_gpus = 1
,
gpu_ids = [0, 1]
, then they are modified to follow the setting with more GPUs, for example num_gpus = 1 -> num_gpus = 2
.
Evaluating the Model#
The evaluate
config defines the hyperparameters of the evaluation process. The following is an example
config:
evaluate:
checkpoint: /path/to/model.pth
topk: 1
After the model has been trained using the experiment config file and by following the steps to
train a model, the next step is to evaluate this model on a test set to measure the
accuracy of the model. TAO includes the tao model classification_pyt evaluate
command to do this.
The classification app computes evaluation loss and Top-k accuracy.
After training, the model is stored in the output directory of your choice in
results_dir
.
evaluate:
checkpoint: /path/to/model.pth
tao model classification_pyt evaluate [-h] -e <experiment_spec>
evaluate.checkpoint=<model to be evaluated>
results_dir=<path to results dir>
[evaluate.<evaluate_option>=<evaluate_option_value>]
[evaluate.gpu_ids=<gpu indices>]
[evaluate.num_gpus=<number of gpus>]
Required Arguments#
-e, --experiment_spec
: The experiment spec file to set up the evaluation experimentevaluate.checkpoint
: The.pth
model to be evaluated.results_dir
: The path where the results will be stored
Optional Arguments#
evaluate.<evaluate_option>
: The evaluate options.
Running Inference on a Model#
For classification, tao model classification_pyt inference
saves a .csv
file containing the image paths
and the corresponding labels for multiple images. TensorRT Python inference can also be enabled.
inference:
checkpoint: /path/to/model.pth
tao model classification_pyt inference [-h] -e <experiment_spec_file>
inference.checkpoint=<model to be inferenced>
results_dir=<path to results dir>
[inference.<inference_option>=<inference_option_value>]
[inference.gpu_ids=<gpu indices>]
[inference.num_gpus=<number of gpus>]
Required Arguments#
-e, --experiment_spec
: The experiment spec file to set up the inference experimentinference.checkpoint
: The.pth
model to inference.results_dir
: The path where the results will be stored
Optional Arguments#
inference.<inference_option>
: The inference options.
Exporting the model#
Exporting the model decouples the training process from inference and allows conversion to
TensorRT engines outside the TAO environment. TensorRT engines are specific to each hardware
configuration and should be generated for each unique inference environment.
The exported model may be used universally across training and deployment hardware.
The exported model format is referred to as .onnx
.
The export
parameter defines the hyperparameters of the export process.
export:
checkpoint: /path/to/model.pth
onnx_file: /path/to/model.onnx
opset_version: 12
verify: False
input_channel: 3
input_width: 224
input_height: 224
Here’s an example of the tao classification_pyt export
command:
tao model classification_pyt export [-h] -e <experiment spec file>
export.checkpoint=<model to export>
export.onnx_file=<onnx path>
[export.<export_option>=<export_option_value>]
Required Arguments#
-e, --experiment_spec
: The path to an experiment spec fileexport.checkpoint
: The.pth
model to export.export.onnx_file
: The path where the.etlt
or.onnx
model is saved.
Optional Arguments#
export.<export_option>
: The export options.
TensorRT Engine Generation, Validation, and INT8 Calibration#
For TensorRT engine generation, validation, and INT8 calibration, refer to the TAO Deploy documentation.
Deploying to DeepStream#
Refer to the Integrating a Classification (TF1/TF2/PyTorch) Model page for more information about deploying a classification model with DeepStream.