Image Classification PyT

Image Classification PyT is a PyTorch-based image-classification model included in the TAO Toolkit. It supports the following tasks:

  • train

  • evaluate

  • inference

  • export

These tasks can be invoked from the TAO Toolkit Launcher using the following convention on the command-line:

Copy
Copied!
            

tao model classification_pyt <sub_task> <args_per_subtask>

Where, args_per_subtask are the command-line arguments required for a given subtask. Each subtask is explained in detail in the following sections.

TAO Toolkit has been updated to the latest MMPretrain version instead of the deprecated MMClassification. .. Note:: Image Classification (PyT) is based off of MMPretrain. Hence, most parameters are adopted from the MMPretrain 1.x format.

See the Data Annotation Format page for more information about the data format for image classification.

The train classification experiment specification consists of three main components:

  • dataset

  • train

  • model

Here is an example of dataset specification file for classification PyT with a FAN backbone:

Copy
Copied!
            

dataset: data: samples_per_gpu: 128 workers_per_gpu: 8 train: data_prefix: "/raid/ImageNet2012/ImageNet2012/train" pipeline: # Augmentations alone - type: RandomResizedCrop size: 224 backend: "pillow" - type: RandomFlip prob: 0.5 direction: "horizontal" - type: ColorJitter brightness: 0.4 contrast: 0.4 saturation: 0.4 - type: RandomErasing erase_prob: 0.3 val: data_prefix: /raid/ImageNet2012/ImageNet2012/val test: data_prefix: /raid/ImageNet2012/ImageNet2012/val

The table below describes the configurable parameters in dataset.

Parameter Datatype Default Description Supported Values
sampler dict config None The dataset sampler type

img_norm_cfg

dict config
float
float
bool

None
[123.675, 116.28, 103.53]
[58.395, 57.12, 57.375]
False

Contains the following configurable parameters:
* mean: The mean to be subtracted from image
* std: The tandard deviation to divide the image
* to_rgb: A flag specifying whether to convert to RGB format

> 0
-
-

data dict config None Parameters related to training. Refer to data for more details.

data

Parameter Datatype Default Description Supported Values
samples_per_gpu int None The dataset sampler type

img_norm_config

str
str
bool

Dict
[123.675, 116.28, 103.53]
[58.395, 57.12, 57.375]
False

Contains the following configurable parameters:
* mean: The mean to be subtracted from the image
* std: The standard deviation to divide the image
* to_rgb: A flag specifying whether to convert to RGB format

> 0
-
True/ False

train

dict config
str
str
Dict

str

None

Contains the training dataset configuration:
* data_prefix: The parent folder containing folders of different classes
* ann_file: A text file where every line is an image name and
corresponding class ID. For more information, refer to the
Data Annotation Format section.
* pipeline: The data processing pipeline, which contains the
pre-processing transforms.
For more information, refer to the pipeline config
* classes: A text file containing the classes (one class per line)

Imagenet Classes

test

dict config
str
str
Dict

str

None

Contains the test dataset configuration:
* data_prefix: The parent folder containing folders of different classes
* ann_file: A text file where every line is an image name and
corresponding class ID. For more information, refer to the
Data Annotation Format section.
* pipeline: The data processing pipeline, which contains the
pre-processing transoforms.
For more information, refer to the pipeline config
* classes: A text file containing the classes (one class per line)

Imagenet Classes

val

dict config
str
str
Dict

str

None

Contains the validation dataset configuration:
* data_prefix: The parent folder containing folders of different classes
* ann_file: A text file where every line is an image name and
corresponding class ID. For more information, refer to the
Data Annotation Format section.
* pipeline: The data processing pipeline, which contains the
pre-processing transoforms.
For more information, refer to pipeline config
* classes: A text file containing the classes (one class per line)

Imagenet Classes

Note

Refer to the MMPretrain 1.x format documentation for more details.

pipeline

The following is an example pipeline config with different augmentations:

Copy
Copied!
            

pipeline: # Augmentations alone - type: RandomResizedCrop scale: 224 backend: "pillow" - type: RandomFlip prob: 0.5 direction: "horizontal" - type: ColorJitter brightness: 0.4 contrast: 0.4 saturation: 0.4 - type: RandomErasing erase_prob: 0.3

Some of the widely adopted augmentations and the parameters are listed below. More information, refer to the MMPretrain documentation for transforms

Parameter Datatype Default Description Supported Values

RandomResizedCrop

dict config
int
str

None
int
bilinear

Contains the following configurable parameters:
* scale: The desired output scale of the crop
* interpolation: The interpolation method

> 0
-

RandomFlip

dict config
prob
direction

None

Contains the following configurable parameters:
* prob: The probability at which to flip the image
* direction: The flipping direction

0-1
horizontal,vertical

RandomCrop

dict config
int/ List
int

int
str

None

Contains the following configurable parameters:
* crop_size: The desired output scale of the crop
* padding: Optional padding on each imageborder. If a sequence of
length 4 is provided, it is used to pad left, top, right, bottom borders.
If a length of 2 is provided, it is used to pad left/right, top/bottom.
* pad_val: The pixel pad_val value for constant fill
* padding_mode: The padding type

> 0
> 0
> 0
> 0
> 0
constant, edge
reflect, symmetric

ColorJitter

dict config
float
float
float

None

The ColorJitter augmentation contains the following parameters:
* brightness: How much to jitter brightness
* contrast: How much to jitter contrast
* saturation: How much to jitter saturation

0-1
0-1
0-1

RandomErasing

dict config
float
float
float
str

None
0.5
0.02
0.4
const

The RandomErasing augmentation contains the following parameters:
* erase_prob: The probability that image will be randomly erased
* min_area_ratio: The minimum erased area divided by the input image area
* max_area_ratio: The maximum erased area divided by the input image area
* mode: The fill method in the erased area:
* const: All pixels are assigned with the same value
* rand: Each pixel is assigned with a random value in [0, 255]

0-1
0-1
0-1
const/ rand

Here is an example of a train specification file for Image Classification PyT:

Copy
Copied!
            

train: train_config: runner: max_epochs: 300 checkpoint_config: interval: 1 logging: interval: 5000 validate: True evaluation: interval: 10 custom_hooks: - type: "EMAHook" momentum: 0.00008 priority: "ABOVE_NORMAL" lr_config: policy: CosineAnnealing T_max: 95 by_epoch: True begin: 5 end: 100 optimizer: type: AdamW lr: 0.005 weight_decay: 0.05

The table below describes the configurable parameters in the train specification.

Parameter Datatype Default Description Supported Values

exp_config

dict config
int
str
int

None
47
“127.0.0.1”
631

Contains the following configurable parameters:
* manual_seed: The random seed of experiment
* MASTER_ADDR: The host name of the Master Node
* MASTER_PORT: The port on the MASTER_ADDR

> 0
-
-

train_config dict config None Parameters related to training. For more information, refer to train_config
results_dir str None The path for saving the checkpoint and logs str

train_config

Parameter Datatype Default Description Supported Values

runner

dict config
int

None
20

Contains the following configurable parameters:
* max_epochs: The maximum number of epochs for which the training should be conducted

checkpoint_config

dict config
int

None
1

Contains the following configurable parameters:
* interval: The number of steps at which the checkpoint needs to be saved
Note that, currently, only Epoch Based Training is supported.

>0

logging

dict config
int

None
10

Contains the following configurable parameters:
* interval: The number of iterations at which the experiment logs need to be saved. The logs are
saved in the logs directory in the output directory.

>0

optimizer

dict config

None

Contains the configurable parameters for different optimizers, as detailed in
optimizer.

optimizer_config

dict config
float

None
None

Contains the following parameters:
* max_norm: The max norm of the gradients

>=0.0

evaluation

dict config
int

None
int

Contains the following configurable parameters:
* interval : The interval number of iterations at which validation should be performed during training

validate bool False A flag that enables validation during training
find_unused_parameters bool False Sets this parameter in DDP. For more information, refer DDP_PyT.
lr_config dict None The learning-rate scheduler configuration. For more details, refer to lr_config
load_from str None The checkpoint path from where the end-end model weights including head can be loaded
custom_hooks dict None The custom training hooks configuration. For more details, refer to custom_hooks.
resume_training_checkpoint_path str None The checkpoint path to resume the training from

optimizer

The following optimizers are supported:

SGD

Parameter Datatype Default Description Supported Values

optimizer

dict config
str
float
float
float

None
None
None
0
0

Contains the following configurable parameters:
* type: “SGD”
* lr: The learning Rate
* momentum: The momentum factor
* weight_decay: The maximum number of epochs for which the training should be conducted

AdamW

Parameter Datatype Default Description Supported Values

optimizer

dict config
str
float
float
float

None
None
1e-3
0.0
1e-8

Contains the following configurable parameters:
* type: “AdamW”
* lr: The learning rate
* weight_decay: The weight decay (L2)
* eps: A term added to the denominator to improve numerical stability

lr_config

The lr_config parameter defines the parameters for the learning-rate scheduler. Some of the widely adopted learning-rate schedulers and the parameters are listed below. More information, refer to the MMPretrain documentation for lr schedulers

CosineAnnealingLR

Parameter Datatype Default Description Supported Values
T_max int
Maximum number of iterations. >=0.0
by_epoch bool True Whether the scheduled learning rate is updated by epochs Less than 1.0
begin int 0 Step at which to start updating the learning rate. In the interval (0, INF).
end int INF Step at which to stop updating the learning rate. In the interval (0, INF).

MultiStepLR

Parameter Datatype Typical value Description Supported Values
gamma float The base (maximum) learning rate Usually less than 1.0
by_epoch bool True Whether the scheduled learning rate is updated by epochs Less than 1.0
begin int 0 Step at which to start updating the learning rate. In the interval (0, INF).
end int INF Step at which to stop updating the learning rate. In the interval (0, INF).

LinearLR

Parameter Datatype Typical value Description Supported Values
by_epoch bool True Whether the scheduled learning rate is updated by epochs Less than 1.0
begin int 0 Step at which to start updating the learning rate. In the interval (0, INF).
end int INF Step at which to stop updating the learning rate. In the interval (0, INF).

PolyLR

Parameter Datatype Typical value Description Supported Values
eta_min float 0 Minimum learning rate at the end of scheduling
by_epoch bool True Whether the scheduled learning rate is updated by epochs Less than 1.0
begin int 0 Step at which to start updating the learning rate. In the interval (0, INF).
end int INF Step at which to stop updating the learning rate. In the interval (0, INF).

StepLR

Parameter Datatype Typical value Description Supported Values
gamma float The base (maximum) learning rate Usually less than 1.0
by_epoch bool True Whether the scheduled learning rate is updated by epochs Less than 1.0
begin int 0 Step at which to start updating the learning rate. In the interval (0, INF).
end int INF Step at which to stop updating the learning rate. In the interval (0, INF).

custom_hooks

The following is an example of how a custom hook from the MMPretrain to Hydra config is provided for EMAHook`:

  • MMPretrain config:

    Copy
    Copied!
                

    custom_hooks = [ dict(type='EMAHook', interval=100, priority='HIGH')]

  • Equivalent TAO Hydra config:

    Copy
    Copied!
                

    custom_hooks: - type: "EMAHook" momentum: 0.00008 priority: "ABOVE_NORMAL"

For more detail on custom_hooks, refer to the MMPretrain documentation for custom hooks.

Here is an example model-specification file for Image Classification PyT with a FAN backbone:

Copy
Copied!
            

model: backbone: type: "fan_tiny_8_p4_hybrid" custom_args: drop_path_rate: 0.1 freeze: False pretrained: <Path to pretrained weights> head: type: "TAOLinearClsHead" num_classes: 1000 custom_args: head_init_scale: 1 loss: type: LabelSmoothLoss label_smooth_val: 0.1 mode: 'original' train_cfg: augments: - type: BatchMixup alpha: 0.8 num_classes: 1000 prob: 0.5 - type: BatchCutMix alpha: 1.0 num_classes: 1000 prob: 0.5

The model parameter primarily configures the backbone and head.

Parameter Datatype Default Description Supported Values

init_cfg

Dict
str
str

None

The init_cfg contains the folllowing config parameters:
* checkpoint: The path to the pre-trained model to be loaded
* prefix: The string to be removed from state_dict keys

backbone

Dict
string

None

Contains the following configurable parameters
* type: The name of the backbone to be used

* freeze: A flag specifying whether to freeze or
unfreeze the backbone

* pretrained: The path to the pre-trained weights to be
loaded

Refer to the Foundation Models section for Foundation Models
configuration.

FAN Variants
fan_tiny_8_p4_hybrid, fan_small_12_p4_hybrid,
fan_base_16_p4_hybrid, fan_large_16_p4_hybrid,
fan_Xlarge_16_p4_hybrid, fan_base_18_p16_224,
fan_tiny_12_p16_224, fan_small_12_p16_224,
fan_large_24_p16_224

GCViT Variants
gc_vit_xxtiny, gc_vit_xtiny, gc_vit_tiny,
gc_vit_small, gc_vit_base, gc_vit_large,

FasterViT Variants
faster_vit_0_224, faster_vit_1_224,
faster_vit_2_224, faster_vit_3_224,
faster_vit_4_224, faster_vit_5_224,
faster_vit_6_224, faster_vit_4_21k_224,
faster_vit_4_21k_384, faster_vit_4_21k_512,
faster_vit_4_21k_768

False/True

head Dict None The config parameters for the classification head
train_cfg Dict None Contains advanced augmentation parameters.

Foundation Models

Copy
Copied!
            

model: backbone: type: "ViT-B-32" custom_args: drop_path_rate: 0.1 freeze: False pretrained: laion400m_e31 head: type: LinearClsHead num_classes: 1000 in_channels: 512 loss: type: CrossEntropyLoss loss_weight: 1.0 use_soft: False topk: [1, 5]

Subset of the supported arch and the pre-train datasets. Please note that the in_channels should be updated under the head :

  • CLIP Image Backbones:

Arch Pretrained Dataset in_channels

ViT-B-32

* laion400m_e31
* laion400m_e32
* laion2b_e16
* laion2b_s34b_b79k
* datacomp_m_s128m_b4k
* laion2b_s34b_b79k
* laion2b_s34b_b79k
* laion2b_s34b_b79k
* openai

512
ViT-B-16 laion400m_e31 512
ViT-L-14 laion400m_e31 768
ViT-H-14 laion2b_s32b_b79k 1024
ViT-g-14 laion2b_s12b_b42k 1024
  • EVA - CLIP Image Backbones:

Arch Pretrained Dataset in_channels
EVA02-L-14 merged2b_s4b_b131k 768
EVA02-L-14-336 laion400m_e31 768
EVA02-E-14 laion400m_e31 1024
EVA02-E-14-plus laion2b_s32b_b79k 1024
Parameter Datatype Default Description Supported Values
type string None Parameters for Beta distribution to generate the mixing ratio TAOLinearClsHead, FANLinearClsHead, LogisticRegressionHead
num_classes Dict None The number of training classes >=0
loss Dict {“type”:”CrossEntropyLoss”} Refer to losses for different types of loss and their parameters
topk List [1,] The number of classes >=0

custom_args

Dict

None

Any custom parameters to be passed to head
(e.g.``head_init_scale`` is used for TAOLinearClsHead)

lr_head

Dict

None

Parameters used for Logistic Regression Head
(e.g.``C`` is used for tuning the regularization strength)

lr_head

Logistic Regression head is defined by the following parameters:

lr_head

Dict

None

Contains the following tunable parameters:
C: Inverse of regularization strength
max_iter: Maximum number of iterations taken for the solvers to converge
class_weight: Whether to support balanced class training
solver: Algorithm to use in the optimization problem.


>=0.0
>0
‘balanced’, None
‘lbfgs’, ‘liblinear’, ‘newton-cg’, ‘newton-cholesky’, ‘sag’, ‘saga’

Logistic Regression head involves freezing the backbone weights. Please note that the freeze parameter should be updated to True under the backbone config as under:

Copy
Copied!
            

model: backbone: type: "ViT-B-32" custom_args: drop_path_rate: 0.1 freeze: True pretrained: laion400m_e31 head: type: "LogisticRegressionHead" num_classes: 1000 lr_head: C: 0.316 max_iter: 5000

train_cfg

BatchMixup

Parameter Datatype Default Description Supported Values
alpha string None Parameters for Beta distribution to generate the mixing ratio 0-1
prob Dict None The probability at which to apply augmentation 0-1
num_classes int None The number of classes >=0

BatchCutMix

Parameter Datatype Default Description Supported Values
alpha string None Parameters for Beta distribution to generate the mixing ratio 0-1
prob Dict None The probability at which to apply the augmentation 0-1
num_classes int None Number of classes >=0

loss

Some Important Losses for classification losses are shown below. Please note that all supported losses in MMPretrain can be used by following the Hydra config for TAO Toolkit. For a list of MMPretrain losses, refer to the losses_mmpretrain documentation.

LabelSmoothLoss

Parameter Datatype Default Description Supported Values
label_smooth_val string None The degree of label smoothing 0-1
use_sigmoid bool None Specifies whether prediction should use the sigmoid of softmax False/ True
num_classes int None The number of classes >=0
mode string None Parameters for Beta distribution to generate the mixing ratio 0-1
reduction str None The method used to reduce the loss mean, sum
loss_weight float 1.0 The weight of the loss >=0

CrossEntropyLoss

Parameter Datatype Default Description Supported Values
use_sigmoid bool False Specifies whether prediction should use the sigmoid of softmax 0-1
use_soft bool False Specifies whether to use the soft version of CrossEntropyLoss 0-1
loss_weight float 1.0 The weight of the loss 0-1
reduction str None The method used to reduce the loss mean, sum
class_weight list[float] None The weight for each class 0-1

Use the tao model classification_pyt train command to train a classification pytorch model:

Copy
Copied!
            

tao model classification_pyt train [-h] -e <spec file> -r <result directory> [-g <num GPUs>]

Required Arguments

  • -r, --results_dir: The path to a folder where the experiment outputs should be written

  • -e, --experiment_spec_file: The path to the experiment spec file

Optional Arguments

  • -g, --gpus: The nubmer of GPUs to use for training. The default value is 1.

  • -h, --help: Print the help message.

Sample Usage

Here’s an example of using the tao model classification_pyt train command:

Copy
Copied!
            

tao model classification_pyt train -e /workspace/cats_dogs/spec/train_cats_dogs.yaml -r /workspace/output

The evaluate config defines the hyperparameters of the evaluation process. The following is an example config:

Copy
Copied!
            

evaluate: checkpoint: /path/to/model.pth topk: 1

After the model has been trained using the experiment config file and by following the steps to train a model, the next step is to evaluate this model on a test set to measure the accuracy of the model. The TAO toolkit includes the tao model classification_pyt evaluate command to do this.

The classification app computes evaluation loss and Top-k accuracy.

After training, the model is stored in the output directory of your choice in results_dir.

Copy
Copied!
            

evaluate: checkpoint: /path/to/model.pth

Copy
Copied!
            

tao model classification_pyt evaluate [-h] -e <experiment_spec_file> evaluate.checkpoint=<model to be evaluated> results_dir=<path to results dir> [-g <num gpus>]

Required Arguments

  • -e, --experiment_spec_file: The path to the experiment spec file

Optional Arguments

  • -h, --help: Show this help message and exit.

  • -g, --gpus: The number of GPUs for conducting evaluation

If you followed the example in training a classification model, run the evaluation:

Copy
Copied!
            

tao model classification_pyt evaluate -e /path/to/classification_eval.yaml

TAO will evaluate for classification and produces the Top-K accuracy metric.

For classification, tao model classification_pyt inference saves a .csv file containing the image paths and the corresponding labels for multiple images. TensorRT Python inference can also be enabled.

Copy
Copied!
            

inference: checkpoint: /path/to/model.pth

Copy
Copied!
            

tao model classification_pyt inference [-h] -e <experiment_spec_file> inference.checkpoint=<model to be inferenced> results_dir=<path to results dir> [-g <num gpus>]

Required Arguments

  • -e, --experiment_spec_file: The path to the experiment spec file

Optional Arguments

  • -h, --help: Show this help message and exit.

  • -g, --gpus: The number of GPUs to conduct the evaluation

Exporting the model decouples the training process from inference and allows conversion to TensorRT engines outside the TAO environment. TensorRT engines are specific to each hardware configuration and should be generated for each unique inference environment. The exported model may be used universally across training and deployment hardware. The exported model format is referred to as .onnx.

The export parameter defines the hyperparameters of the export process.

Copy
Copied!
            

export: checkpoint: /path/to/model.pth onnx_file: /path/to/model.onnx opset_version: 12 verify: False input_channel: 3 input_width: 224 input_height: 224

Here’s an example of the tao classification_pyt export command:

Copy
Copied!
            

tao model classification_pyt export [-h] -e <experiment spec file> [-r <results_dir>] export.checkpoint=<model to export> export.onnx_file=<onnx path>

Required Arguments

  • -e, --experiment_spec: The path to an experiment spec file

Optional Arguments

  • -r, --results_dir: The directory where the inference result is stored

  • export.checkpoint: The .tlt or .pth model to export

  • export.onnx_file: The path where the .etlt or .onnx model will be saved

Sample Usage

The following is a sample export command.

Copy
Copied!
            

tao model classification_pyt export -e /path/to/spec.yaml export.checkpoint=/path/to/model.pth export.onnx_file=/path/to/model.onnx

For TensorRT engine generation, validation, and INT8 calibration, refer to the TAO Deploy documentation.

Refer to the Integrating a Classification (TF1/TF2/PyTorch) Model page for more information about deploying a classification model with DeepStream.

Previous Image Classification (TF2)
Next Object Detection
© Copyright 2024, NVIDIA. Last updated on Mar 22, 2024.