Image Classification PyT#

Image Classification PyT is a PyTorch-based image-classification model included in TAO. It supports the following tasks:

  • train

  • evaluate

  • inference

  • export

These tasks can be invoked from the TAO Launcher using the following convention on the command-line:

tao model classification_pyt <sub_task> <args_per_subtask>

Where, args_per_subtask are the command-line arguments required for a given subtask. Each subtask is explained in detail in the following sections.

TAO has been updated to the latest MMPretrain version instead of the deprecated MMClassification. .. Note:: Image Classification (PyT) is based off of MMPretrain. Hence, most parameters are adopted from the MMPretrain 1.x format.

Preparing the Input Data Structure#

See the Data Annotation Format page for more information about the data format for image classification.

The train classification experiment specification consists of three main components:

  • dataset

  • train

  • model

Dataset Input for Classification PyT#

Here is an example of dataset specification file for classification PyT with a FAN backbone:

dataset:
  data:
    samples_per_gpu: 128
    workers_per_gpu: 8
    train:
      data_prefix: "/raid/ImageNet2012/ImageNet2012/train"
      pipeline: # Augmentations alone
        - type: RandomResizedCrop
          scale: 224
        - type: RandomFlip
          prob: 0.5
          direction: "horizontal"
        - type: ColorJitter
          brightness: 0.4
          contrast: 0.4
          saturation: 0.4
        - type: RandomErasing
          erase_prob: 0.3
    val:
      data_prefix: /raid/ImageNet2012/ImageNet2012/val
    test:
      data_prefix: /raid/ImageNet2012/ImageNet2012/val

The table below describes the configurable parameters in dataset.

Parameter

Datatype

Default

Description

Supported Values

sampler

dict config

None

The dataset sampler type

img_norm_cfg



dict config
float
float
bool
None
[123.675, 116.28, 103.53]
[58.395, 57.12, 57.375]
False
Contains the following configurable parameters:
* mean: The mean to be subtracted from image
* std: The tandard deviation to divide the image
* to_rgb: A flag specifying whether to convert to RGB format

> 0
-
-

data

dict config

None

Parameters related to training. Refer to data for more details.

data#

Parameter

Datatype

Default

Description

Supported Values

samples_per_gpu

int

None

The dataset sampler type

img_norm_config




str
str
bool
Dict
[123.675, 116.28, 103.53]
[58.395, 57.12, 57.375]
False
Contains the following configurable parameters:
* mean: The mean to be subtracted from the image
* std: The standard deviation to divide the image
* to_rgb: A flag specifying whether to convert to RGB format

> 0
-
True/ False
train








dict config
str
str
Dict




str
None








Contains the training dataset configuration:
* data_prefix: The parent folder containing folders of different classes
* ann_file: A text file where every line is an image name and
corresponding class ID. For more information, refer to the
* pipeline: The data processing pipeline, which contains the
pre-processing transforms.
For more information, refer to the pipeline config
* classes: A text file containing the classes (one class per line)








Imagenet Classes
test








dict config
str
str
Dict




str
None








Contains the test dataset configuration:
* data_prefix: The parent folder containing folders of different classes
* ann_file: A text file where every line is an image name and
corresponding class ID. For more information, refer to the
* pipeline: The data processing pipeline, which contains the
pre-processing transoforms.
For more information, refer to the pipeline config
* classes: A text file containing the classes (one class per line)








Imagenet Classes
val








dict config
str
str
Dict




str
None








Contains the validation dataset configuration:
* data_prefix: The parent folder containing folders of different classes
* ann_file: A text file where every line is an image name and
corresponding class ID. For more information, refer to the
* pipeline: The data processing pipeline, which contains the
pre-processing transoforms.
For more information, refer to pipeline config
* classes: A text file containing the classes (one class per line)








Imagenet Classes

Note

Refer to the MMPretrain 1.x format documentation for more details.

pipeline#

The following is an example pipeline config with different augmentations:

pipeline: # Augmentations alone
  - type: RandomResizedCrop
    scale: 224
    backend: "pillow"
  - type: RandomFlip
    prob: 0.5
    direction: "horizontal"
  - type: ColorJitter
    brightness: 0.4
    contrast: 0.4
    saturation: 0.4
  - type: RandomErasing
    erase_prob: 0.3

Some of the widely adopted augmentations and the parameters are listed below. More information, refer to the MMPretrain documentation for transforms

Parameter

Datatype

Default

Description

Supported Values

RandomResizedCrop


dict config
int
str
None
int
bilinear
Contains the following configurable parameters:
* scale: The desired output scale of the crop
* interpolation: The interpolation method

> 0
-
RandomFlip


dict config
prob
direction
None


Contains the following configurable parameters:
* prob: The probability at which to flip the image
* direction: The flipping direction

0-1
horizontal,vertical
RandomCrop







dict config
int/ List
int


int
str

None







Contains the following configurable parameters:
* crop_size: The desired output scale of the crop
* padding: Optional padding on each imageborder. If a sequence of
length 4 is provided, it is used to pad left, top, right, bottom borders.
If a length of 2 is provided, it is used to pad left/right, top/bottom.
* pad_val: The pixel pad_val value for constant fill
* padding_mode: The padding type


> 0
> 0
> 0
> 0
> 0
constant, edge
reflect, symmetric
ColorJitter



dict config
float
float
float
None



The ColorJitter augmentation contains the following parameters:
* brightness: How much to jitter brightness
* contrast: How much to jitter contrast
* saturation: How much to jitter saturation

0-1
0-1
0-1
RandomErasing






dict config
float
float
float
str


None
0.5
0.02
0.4
const


The RandomErasing augmentation contains the following parameters:
* erase_prob: The probability that image will be randomly erased
* min_area_ratio: The minimum erased area divided by the input image area
* max_area_ratio: The maximum erased area divided by the input image area
* mode: The fill method in the erased area:
* const: All pixels are assigned with the same value
* rand: Each pixel is assigned with a random value in [0, 255]

0-1
0-1
0-1
const/ rand


train#

Here is an example of a train specification file for Image Classification PyT:

train:
  train_config:
    runner:
      max_epochs: 300
    checkpoint_config:
      interval: 1
    logging:
      interval: 5000
    validate: True
    evaluation:
      interval: 10
    custom_hooks:
      - type: "EMAHook"
        momentum: 0.00008
        priority: "ABOVE_NORMAL"
    lr_config:
      policy: CosineAnnealing
      T_max: 95
      by_epoch: True
      begin: 5
      end: 100
    optimizer:
        type: AdamW
        lr: 0.005
        weight_decay: 0.05

The table below describes the configurable parameters in the train specification.

Parameter

Datatype

Default

Description

Supported Values

exp_config



dict config
int
str
int
None
47
“127.0.0.1”
631
Contains the following configurable parameters:
* manual_seed: The random seed of experiment
* MASTER_ADDR: The host name of the Master Node
* MASTER_PORT: The port on the MASTER_ADDR

> 0
-
-

train_config

dict config

None

Parameters related to training. For more information, refer to train_config

results_dir

str

None

The path for saving the checkpoint and logs

str

train_config#

Parameter

Datatype

Default

Description

Supported Values

runner

dict config
int
None
20
Contains the following configurable parameters:
* max_epochs: The maximum number of epochs for which the training should be conducted


checkpoint_config


dict config
int

None
1

Contains the following configurable parameters:
* interval: The number of steps at which the checkpoint needs to be saved
Note that, currently, only Epoch Based Training is supported.

>0

logging


dict config
int

None
10

Contains the following configurable parameters:
* interval: The number of iterations at which the experiment logs need to be saved. The logs are
saved in the logs directory in the output directory.

>0

optimizer

dict config

None

Contains the configurable parameters for different optimizers, as detailed in


optimizer_config

dict config
float
None
None
Contains the following parameters:
* max_norm: The max norm of the gradients

>=0.0
evaluation

dict config
int
None
int
Contains the following configurable parameters:
* interval : The interval number of iterations at which validation should be performed during training


validate

bool

False

A flag that enables validation during training

find_unused_parameters

bool

False

Sets this parameter in DDP. For more information, refer DDP_PyT.

lr_config

dict

None

The learning-rate scheduler configuration. For more details, refer to lr_config

load_from

str

None

The checkpoint path from where the end-end model weights including head can be loaded

custom_hooks

dict

None

The custom training hooks configuration. For more details, refer to custom_hooks.

resume_training_checkpoint_path

str

None

The checkpoint path to resume the training from

optimizer#

The following optimizers are supported:

SGD

Parameter

Datatype

Default

Description

Supported Values

optimizer




dict config
str
float
float
float
None
None
None
0
0
Contains the following configurable parameters:
* type: “SGD”
* lr: The learning Rate
* momentum: The momentum factor
* weight_decay: The maximum number of epochs for which the training should be conducted





AdamW

Parameter

Datatype

Default

Description

Supported Values

optimizer




dict config
str
float
float
float
None
None
1e-3
0.0
1e-8
Contains the following configurable parameters:
* type: “AdamW”
* lr: The learning rate
* weight_decay: The weight decay (L2)
* eps: A term added to the denominator to improve numerical stability





lr_config#

The lr_config parameter defines the parameters for the learning-rate scheduler. Some of the widely adopted learning-rate schedulers and the parameters are listed below. More information, refer to the MMPretrain documentation for lr schedulers

CosineAnnealingLR

Parameter

Datatype

Default

Description

Supported Values

T_max

int

Maximum number of iterations.

>=0.0

by_epoch

bool

True

Whether the scheduled learning rate is updated by epochs

Less than 1.0

begin

int

0

Step at which to start updating the learning rate.

In the interval (0, INF).

end

int

INF

Step at which to stop updating the learning rate.

In the interval (0, INF).

MultiStepLR

Parameter

Datatype

Typical value

Description

Supported Values

gamma

float

The base (maximum) learning rate

Usually less than 1.0

by_epoch

bool

True

Whether the scheduled learning rate is updated by epochs

Less than 1.0

begin

int

0

Step at which to start updating the learning rate.

In the interval (0, INF).

end

int

INF

Step at which to stop updating the learning rate.

In the interval (0, INF).

LinearLR

Parameter

Datatype

Typical value

Description

Supported Values

by_epoch

bool

True

Whether the scheduled learning rate is updated by epochs

Less than 1.0

begin

int

0

Step at which to start updating the learning rate.

In the interval (0, INF).

end

int

INF

Step at which to stop updating the learning rate.

In the interval (0, INF).

PolyLR

Parameter

Datatype

Typical value

Description

Supported Values

eta_min

float

0

Minimum learning rate at the end of scheduling

by_epoch

bool

True

Whether the scheduled learning rate is updated by epochs

Less than 1.0

begin

int

0

Step at which to start updating the learning rate.

In the interval (0, INF).

end

int

INF

Step at which to stop updating the learning rate.

In the interval (0, INF).

StepLR

Parameter

Datatype

Typical value

Description

Supported Values

gamma

float

The base (maximum) learning rate

Usually less than 1.0

by_epoch

bool

True

Whether the scheduled learning rate is updated by epochs

Less than 1.0

begin

int

0

Step at which to start updating the learning rate.

In the interval (0, INF).

end

int

INF

Step at which to stop updating the learning rate.

In the interval (0, INF).

custom_hooks#

The following is an example of how a custom hook from the MMPretrain to Hydra config is provided for EMAHook`:

  • MMPretrain config:

    custom_hooks = [
      dict(type='EMAHook', interval=100, priority='HIGH')]
    
  • Equivalent TAO Hydra config:

    custom_hooks:
      - type: "EMAHook"
        momentum: 0.00008
        priority: "ABOVE_NORMAL"
    

For more detail on custom_hooks, refer to the MMPretrain documentation for custom hooks.

model#

Here is an example model-specification file for Image Classification PyT with a FAN backbone:

model:
  backbone:
    type: "fan_tiny_8_p4_hybrid"
    custom_args:
      drop_path_rate: 0.1
    freeze: False
    pretrained: <Path to pretrained weights>
  head:
    type: "TAOLinearClsHead"
    num_classes: 1000
    custom_args:
      head_init_scale: 1
    loss:
      type: LabelSmoothLoss
      label_smooth_val: 0.1
      mode: 'original'
  train_cfg:
    augments:
      - type: Mixup
        alpha: 0.8
      - type: CutMix
        alpha: 1.0

The model parameter primarily configures the backbone and head.

Parameter

Datatype

Default

Description

Supported Values

init_cfg


Dict
str
str
None


The init_cfg contains the folllowing config parameters:
* checkpoint: The path to the pre-trained model to be loaded
* prefix: The string to be removed from state_dict keys



backbone


























Dict
string

























None


























Contains the following configurable parameters
* type: The name of the backbone to be used


















* freeze: A flag specifying whether to freeze or
unfreeze the backbone
* pretrained: The path to the pre-trained weights to be
loaded

Refer to the Foundation Models section for Foundation Models
configuration.

FAN Variants
fan_tiny_8_p4_hybrid, fan_small_12_p4_hybrid,
fan_base_16_p4_hybrid, fan_large_16_p4_hybrid,
fan_Xlarge_16_p4_hybrid, fan_base_18_p16_224,
fan_tiny_12_p16_224, fan_small_12_p16_224,
fan_large_24_p16_224

GCViT Variants
gc_vit_xxtiny, gc_vit_xtiny, gc_vit_tiny,
gc_vit_small, gc_vit_base, gc_vit_large,

FasterViT Variants
faster_vit_0_224, faster_vit_1_224,
faster_vit_2_224, faster_vit_3_224,
faster_vit_4_224, faster_vit_5_224,
faster_vit_6_224, faster_vit_4_21k_224,
faster_vit_4_21k_384, faster_vit_4_21k_512,
faster_vit_4_21k_768

False/True






head

Dict

None

The config parameters for the classification head

train_cfg

Dict

None

Contains advanced augmentation parameters.

Foundation Models#

model:
  backbone:
    type: "ViT-B-32"
    custom_args:
      drop_path_rate: 0.1
    freeze: False
    pretrained: laion400m_e31
  head:
    type: LinearClsHead
    num_classes: 1000
    in_channels: 512
    loss:
      type: CrossEntropyLoss
      loss_weight: 1.0
      use_soft: False
    topk: [1, 5]

Subset of the supported arch and the pre-train datasets. Please note that the in_channels should be updated under the head :

  • CLIP Image Backbones:

Arch

Pretrained Dataset

in_channels

ViT-B-32








* laion400m_e31
* laion400m_e32
* laion2b_e16
* laion2b_s34b_b79k
* datacomp_m_s128m_b4k
* laion2b_s34b_b79k
* laion2b_s34b_b79k
* laion2b_s34b_b79k
* openai

512

ViT-B-16

laion400m_e31

512

ViT-L-14

laion400m_e31

768

ViT-H-14

laion2b_s32b_b79k

1024

ViT-g-14

laion2b_s12b_b42k

1024

  • EVA - CLIP Image Backbones:

Arch

Pretrained Dataset

in_channels

EVA02-L-14

merged2b_s4b_b131k

768

EVA02-L-14-336

laion400m_e31

768

EVA02-E-14

laion400m_e31

1024

EVA02-E-14-plus

laion2b_s32b_b79k

1024

lr_head#

Logistic Regression head is defined by the following parameters:

lr_head




Dict




None




Contains the following tunable parameters:
C: Inverse of regularization strength
max_iter: Maximum number of iterations taken for the solvers to converge
class_weight: Whether to support balanced class training
solver: Algorithm to use in the optimization problem.
>=0.0
>0
‘balanced’, None
‘lbfgs’, ‘liblinear’, ‘newton-cg’, ‘newton-cholesky’, ‘sag’, ‘saga’

Logistic Regression head involves freezing the backbone weights. Please note that the freeze parameter should be updated to True under the backbone config as under:

model:
  backbone:
    type: "ViT-B-32"
    custom_args:
      drop_path_rate: 0.1
    freeze: True
    pretrained: laion400m_e31
  head:
    type: "LogisticRegressionHead"
    num_classes: 1000
    lr_head:
      C: 0.316
      max_iter: 5000

train_cfg#

Mixup

Parameter

Datatype

Default

Description

Supported Values

alpha

string

None

Parameters for Beta distribution to generate the mixing ratio

0-1

CutMix

Parameter

Datatype

Default

Description

Supported Values

alpha

string

None

Parameters for Beta distribution to generate the mixing ratio

0-1

cutmix_minmax

list[float]

None

The min/max area ratio of the patches.

correct_lam

bool

True

Whether to apply lambda correction when cutmix bbox clipped by image borders.

False/True

loss#

Some Important Losses for classification losses are shown below. Please note that all supported losses in MMPretrain can be used by following the Hydra config for TAO. For a list of MMPretrain losses, refer to the losses_mmpretrain documentation.

LabelSmoothLoss

Parameter

Datatype

Default

Description

Supported Values

label_smooth_val

string

None

The degree of label smoothing

0-1

use_sigmoid

bool

None

Specifies whether prediction should use the sigmoid of softmax

False/ True

num_classes

int

None

The number of classes

>=0

mode

string

None

Parameters for Beta distribution to generate the mixing ratio

0-1

reduction

str

None

The method used to reduce the loss

mean, sum

loss_weight

float

1.0

The weight of the loss

>=0

CrossEntropyLoss

Parameter

Datatype

Default

Description

Supported Values

use_sigmoid

bool

False

Specifies whether prediction should use the sigmoid of softmax

0-1

use_soft

bool

False

Specifies whether to use the soft version of CrossEntropyLoss

0-1

loss_weight

float

1.0

The weight of the loss

0-1

reduction

str

None

The method used to reduce the loss

mean, sum

class_weight

list[float]

None

The weight for each class

0-1

Training the model#

Use the tao model classification_pyt train command to train a classification pytorch model:

tao model classification_pyt train [-h] -e <experiment_spec_file>
                             [results_dir=<global_results_dir>]
                             [model.<model_option>=<model_option_value>]
                             [dataset.<dataset_option>=<dataset_option_value>]
                             [train.<train_option>=<train_option_value>]
                             [train.gpu_ids=<gpu indices>]
                             [train.num_gpus=<number of gpus>]

Required Arguments#

The only required argument is the path to the experiment spec:

  • -e, --experiment_spec: The experiment specification file to set up the training experiment

Optional Arguments#

You can set optional arguments to override the option values in the experiment spec file.

Note

For training, evaluation, and inference, we expose 2 variables for each respective task: num_gpus and gpu_ids, which default to 1 and [0], respectively. If both are passed, but inconsistent, for example num_gpus = 1, gpu_ids = [0, 1], then they are modified to follow the setting with more GPUs, for example num_gpus = 1 -> num_gpus = 2.

Evaluating the Model#

The evaluate config defines the hyperparameters of the evaluation process. The following is an example config:

evaluate:
  checkpoint: /path/to/model.pth
  topk: 1

After the model has been trained using the experiment config file and by following the steps to train a model, the next step is to evaluate this model on a test set to measure the accuracy of the model. TAO includes the tao model classification_pyt evaluate command to do this.

The classification app computes evaluation loss and Top-k accuracy.

After training, the model is stored in the output directory of your choice in results_dir.

evaluate:
  checkpoint: /path/to/model.pth
tao model classification_pyt evaluate [-h] -e <experiment_spec>
                             evaluate.checkpoint=<model to be evaluated>
                             results_dir=<path to results dir>
                             [evaluate.<evaluate_option>=<evaluate_option_value>]
                             [evaluate.gpu_ids=<gpu indices>]
                             [evaluate.num_gpus=<number of gpus>]

Required Arguments#

  • -e, --experiment_spec: The experiment spec file to set up the evaluation experiment

  • evaluate.checkpoint: The .pth model to be evaluated.

  • results_dir: The path where the results will be stored

Optional Arguments#

Running Inference on a Model#

For classification, tao model classification_pyt inference saves a .csv file containing the image paths and the corresponding labels for multiple images. TensorRT Python inference can also be enabled.

inference:
  checkpoint: /path/to/model.pth
tao model classification_pyt inference [-h] -e <experiment_spec_file>
                             inference.checkpoint=<model to be inferenced>
                             results_dir=<path to results dir>
                             [inference.<inference_option>=<inference_option_value>]
                             [inference.gpu_ids=<gpu indices>]
                             [inference.num_gpus=<number of gpus>]

Required Arguments#

  • -e, --experiment_spec: The experiment spec file to set up the inference experiment

  • inference.checkpoint: The .pth model to inference.

  • results_dir: The path where the results will be stored

Optional Arguments#

Exporting the model#

Exporting the model decouples the training process from inference and allows conversion to TensorRT engines outside the TAO environment. TensorRT engines are specific to each hardware configuration and should be generated for each unique inference environment. The exported model may be used universally across training and deployment hardware. The exported model format is referred to as .onnx.

The export parameter defines the hyperparameters of the export process.

export:
  checkpoint: /path/to/model.pth
  onnx_file: /path/to/model.onnx
  opset_version: 12
  verify: False
  input_channel: 3
  input_width: 224
  input_height: 224

Here’s an example of the tao classification_pyt export command:

tao model classification_pyt export [-h] -e <experiment spec file>
                             export.checkpoint=<model to export>
                             export.onnx_file=<onnx path>
                             [export.<export_option>=<export_option_value>]

Required Arguments#

  • -e, --experiment_spec: The path to an experiment spec file

  • export.checkpoint: The .pth model to export.

  • export.onnx_file: The path where the .etlt or .onnx model is saved.

Optional Arguments#

TensorRT Engine Generation, Validation, and INT8 Calibration#

For TensorRT engine generation, validation, and INT8 calibration, refer to the TAO Deploy documentation.

Deploying to DeepStream#

Refer to the Integrating a Classification (TF1/TF2/PyTorch) Model page for more information about deploying a classification model with DeepStream.