Image Classification PyT

Image Classification PyT is a PyTorch-based image-classification model included in the TAO Toolkit. It supports the following tasks:

train
evaluate
inference
export

These tasks can be invoked from the TAO Toolkit Launcher using the following convention on the command-line:

Copy
Copied!

            
            tao model classification_pyt <sub_task> <args_per_subtask>

Where, args_per_subtask are the command-line arguments required for a given subtask. Each subtask is explained in detail in the following sections.

TAO Toolkit has been updated to the latest MMPretrain version instead of the deprecated MMClassification. .. Note:: Image Classification (PyT) is based off of MMPretrain. Hence, most parameters are adopted from the MMPretrain 1.x format.

Preparing the Input Data Structure

See the Data Annotation Format page for more information about the data format for image classification.

The train classification experiment specification consists of three main components:

dataset
train
model

Dataset Input for Classification PyT

Here is an example of dataset specification file for classification PyT with a FAN backbone:

Copy
Copied!

            
            dataset:
  data:
    samples_per_gpu: 128
    workers_per_gpu: 8
    train:
      data_prefix: "/raid/ImageNet2012/ImageNet2012/train"
      pipeline: # Augmentations alone
        - type: RandomResizedCrop
          size: 224
          backend: "pillow"
        - type: RandomFlip
          prob: 0.5
          direction: "horizontal"
        - type: ColorJitter
          brightness: 0.4
          contrast: 0.4
          saturation: 0.4
        - type: RandomErasing
          erase_prob: 0.3
    val:
      data_prefix: /raid/ImageNet2012/ImageNet2012/val
    test:
      data_prefix: /raid/ImageNet2012/ImageNet2012/val

The table below describes the configurable parameters in dataset.

Parameter	Datatype	Default	Description	Supported Values
`sampler`	dict config	None	The dataset sampler type
`img_norm_cfg`	dict config float float bool	None [123.675, 116.28, 103.53] [58.395, 57.12, 57.375] False	Contains the following configurable parameters: * `mean`: The mean to be subtracted from image * `std`: The tandard deviation to divide the image * `to_rgb`: A flag specifying whether to convert to RGB format	> 0 - -
`data`	dict config	None	Parameters related to training. Refer to data for more details.

data

Parameter	Datatype	Default	Description	Supported Values
`samples_per_gpu`	int	None	The dataset sampler type
`img_norm_config`	str str bool	Dict [123.675, 116.28, 103.53] [58.395, 57.12, 57.375] False	Contains the following configurable parameters: * `mean`: The mean to be subtracted from the image * `std`: The standard deviation to divide the image * `to_rgb`: A flag specifying whether to convert to RGB format	> 0 - True/ False
`train`	dict config str str Dict str	None	Contains the training dataset configuration: * `data_prefix`: The parent folder containing folders of different classes * `ann_file`: A text file where every line is an image name and corresponding class ID. For more information, refer to the Data Annotation Format section. * `pipeline`: The data processing pipeline, which contains the pre-processing transforms. For more information, refer to the pipeline config * `classes`: A text file containing the classes (one class per line)	Imagenet Classes
`test`	dict config str str Dict str	None	Contains the test dataset configuration: * `data_prefix`: The parent folder containing folders of different classes * `ann_file`: A text file where every line is an image name and corresponding class ID. For more information, refer to the Data Annotation Format section. * pipeline: The data processing pipeline, which contains the pre-processing transoforms. For more information, refer to the pipeline config * classes: A text file containing the classes (one class per line)	Imagenet Classes
`val`	dict config str str Dict str	None	Contains the validation dataset configuration: * data_prefix: The parent folder containing folders of different classes * ann_file: A text file where every line is an image name and corresponding class ID. For more information, refer to the Data Annotation Format section. * pipeline: The data processing pipeline, which contains the pre-processing transoforms. For more information, refer to pipeline config * classes: A text file containing the classes (one class per line)	Imagenet Classes

Note

Refer to the MMPretrain 1.x format documentation for more details.

pipeline

The following is an example pipeline config with different augmentations:

Copy
Copied!

            
            pipeline: # Augmentations alone
  - type: RandomResizedCrop
    scale: 224
    backend: "pillow"
  - type: RandomFlip
    prob: 0.5
    direction: "horizontal"
  - type: ColorJitter
    brightness: 0.4
    contrast: 0.4
    saturation: 0.4
  - type: RandomErasing
    erase_prob: 0.3

Some of the widely adopted augmentations and the parameters are listed below. More information, refer to the MMPretrain documentation for transforms

Parameter Datatype Default Description Supported Values

RandomResizedCrop

dict config
int
str

None
int
bilinear

Contains the following configurable parameters:
* scale: The desired output scale of the crop
* interpolation: The interpolation method

> 0
-

RandomFlip

dict config
prob
direction

None

Contains the following configurable parameters:
* prob: The probability at which to flip the image
* direction: The flipping direction

0-1
horizontal,vertical

RandomCrop

dict config
int/ List
int

int
str

None

Contains the following configurable parameters:
* crop_size: The desired output scale of the crop
* padding: Optional padding on each imageborder. If a sequence of
length 4 is provided, it is used to pad left, top, right, bottom borders.
If a length of 2 is provided, it is used to pad left/right, top/bottom.
* pad_val: The pixel pad_val value for constant fill
* padding_mode: The padding type

> 0
> 0
> 0
> 0
> 0
constant, edge
reflect, symmetric

ColorJitter

dict config
float
float
float

None

The ColorJitter augmentation contains the following parameters:
* brightness: How much to jitter brightness
* contrast: How much to jitter contrast
* saturation: How much to jitter saturation

0-1
0-1
0-1

RandomErasing

dict config
float
float
float
str

None
0.5
0.02
0.4
const

The RandomErasing augmentation contains the following parameters:
* erase_prob: The probability that image will be randomly erased
* min_area_ratio: The minimum erased area divided by the input image area
* max_area_ratio: The maximum erased area divided by the input image area
* mode: The fill method in the erased area:
* const: All pixels are assigned with the same value
* rand: Each pixel is assigned with a random value in [0, 255]

0-1
0-1
0-1
const/ rand

train

Here is an example of a train specification file for Image Classification PyT:

Copy
Copied!

            
            train:
  train_config:
    runner:
      max_epochs: 300
    checkpoint_config:
      interval: 1
    logging:
      interval: 5000
    validate: True
    evaluation:
      interval: 10
    custom_hooks:
      - type: "EMAHook"
        momentum: 0.00008
        priority: "ABOVE_NORMAL"
    lr_config:
      policy: CosineAnnealing
      T_max: 95
      by_epoch: True
      begin: 5
      end: 100
    optimizer:
        type: AdamW
        lr: 0.005
        weight_decay: 0.05

The table below describes the configurable parameters in the train specification.

Parameter	Datatype	Default	Description	Supported Values
`exp_config`	dict config int str int	None 47 “127.0.0.1” 631	Contains the following configurable parameters: * `manual_seed`: The random seed of experiment * `MASTER_ADDR`: The host name of the Master Node * `MASTER_PORT`: The port on the MASTER_ADDR	> 0 - -
`train_config`	dict config	None	Parameters related to training. For more information, refer to train_config
`results_dir`	str	None	The path for saving the checkpoint and logs	str

train_config

Parameter	Datatype	Default	Description	Supported Values
`runner`	dict config int	None 20	Contains the following configurable parameters: * `max_epochs`: The maximum number of epochs for which the training should be conducted
`checkpoint_config`	dict config int	None 1	Contains the following configurable parameters: * `interval`: The number of steps at which the checkpoint needs to be saved Note that, currently, only Epoch Based Training is supported.	>0
`logging`	dict config int	None 10	Contains the following configurable parameters: * `interval`: The number of iterations at which the experiment logs need to be saved. The logs are saved in the `logs` directory in the output directory.	>0
`optimizer`	dict config	None	Contains the configurable parameters for different optimizers, as detailed in optimizer.
`optimizer_config`	dict config float	None None	Contains the following parameters: * `max_norm`: The max norm of the gradients	>=0.0
`evaluation`	dict config int	None int	Contains the following configurable parameters: * `interval` : The interval number of iterations at which validation should be performed during training
`validate`	bool	False	A flag that enables validation during training
`find_unused_parameters`	bool	False	Sets this parameter in DDP. For more information, refer DDP_PyT.
`lr_config`	dict	None	The learning-rate scheduler configuration. For more details, refer to lr_config
`load_from`	str	None	The checkpoint path from where the end-end model weights including head can be loaded
`custom_hooks`	dict	None	The custom training hooks configuration. For more details, refer to custom_hooks.
`resume_training_checkpoint_path`	str	None	The checkpoint path to resume the training from

optimizer

The following optimizers are supported:

SGD

Parameter Datatype Default Description Supported Values

optimizer

dict config
str
float
float
float

None
None
None
0
0

Contains the following configurable parameters:
* type: “SGD”
* lr: The learning Rate
* momentum: The momentum factor
* weight_decay: The maximum number of epochs for which the training should be conducted

AdamW

Parameter Datatype Default Description Supported Values

optimizer

dict config
str
float
float
float

None
None
1e-3
0.0
1e-8

Contains the following configurable parameters:
* type: “AdamW”
* lr: The learning rate
* weight_decay: The weight decay (L2)
* eps: A term added to the denominator to improve numerical stability

lr_config

The lr_config parameter defines the parameters for the learning-rate scheduler. Some of the widely adopted learning-rate schedulers and the parameters are listed below. More information, refer to the MMPretrain documentation for lr schedulers

CosineAnnealingLR

Parameter	Datatype	Default	Description	Supported Values
`T_max`	int		Maximum number of iterations.	>=0.0
`by_epoch`	bool	True	Whether the scheduled learning rate is updated by epochs	Less than 1.0
`begin`	int	0	Step at which to start updating the learning rate.	In the interval (0, INF).
`end`	int	INF	Step at which to stop updating the learning rate.	In the interval (0, INF).

MultiStepLR

Parameter	Datatype	Typical value	Description	Supported Values
`gamma`	float	–	The base (maximum) learning rate	Usually less than 1.0
`by_epoch`	bool	True	Whether the scheduled learning rate is updated by epochs	Less than 1.0
`begin`	int	0	Step at which to start updating the learning rate.	In the interval (0, INF).
`end`	int	INF	Step at which to stop updating the learning rate.	In the interval (0, INF).

LinearLR

Parameter	Datatype	Typical value	Description	Supported Values
`by_epoch`	bool	True	Whether the scheduled learning rate is updated by epochs	Less than 1.0
`begin`	int	0	Step at which to start updating the learning rate.	In the interval (0, INF).
`end`	int	INF	Step at which to stop updating the learning rate.	In the interval (0, INF).

PolyLR

Parameter	Datatype	Typical value	Description	Supported Values
`eta_min`	float	0	Minimum learning rate at the end of scheduling
`by_epoch`	bool	True	Whether the scheduled learning rate is updated by epochs	Less than 1.0
`begin`	int	0	Step at which to start updating the learning rate.	In the interval (0, INF).
`end`	int	INF	Step at which to stop updating the learning rate.	In the interval (0, INF).

StepLR

Parameter	Datatype	Typical value	Description	Supported Values
`gamma`	float	–	The base (maximum) learning rate	Usually less than 1.0
`by_epoch`	bool	True	Whether the scheduled learning rate is updated by epochs	Less than 1.0
`begin`	int	0	Step at which to start updating the learning rate.	In the interval (0, INF).
`end`	int	INF	Step at which to stop updating the learning rate.	In the interval (0, INF).

custom_hooks

The following is an example of how a custom hook from the MMPretrain to Hydra config is provided for EMAHook`:

MMPretrain config:

Copy
Copied!

            
            custom_hooks = [
  dict(type='EMAHook', interval=100, priority='HIGH')]

Equivalent TAO Hydra config:

Copy
Copied!

            
            custom_hooks:
  - type: "EMAHook"
    momentum: 0.00008
    priority: "ABOVE_NORMAL"

For more detail on custom_hooks, refer to the MMPretrain documentation for custom hooks.

model

Here is an example model-specification file for Image Classification PyT with a FAN backbone:

Copy
Copied!

            
            model:
  backbone:
    type: "fan_tiny_8_p4_hybrid"
    custom_args:
      drop_path_rate: 0.1
    freeze: False
    pretrained: <Path to pretrained weights>
  head:
    type: "TAOLinearClsHead"
    num_classes: 1000
    custom_args:
      head_init_scale: 1
    loss:
      type: LabelSmoothLoss
      label_smooth_val: 0.1
      mode: 'original'
  train_cfg:
    augments:
      - type: BatchMixup
        alpha: 0.8
        num_classes: 1000
        prob: 0.5
      - type: BatchCutMix
        alpha: 1.0
        num_classes: 1000
        prob: 0.5

The model parameter primarily configures the backbone and head.

Parameter	Datatype	Default	Description	Supported Values
`init_cfg`	Dict str str	None	The `init_cfg` contains the folllowing config parameters: * `checkpoint`: The path to the pre-trained model to be loaded * `prefix`: The string to be removed from `state_dict` keys
`backbone`	Dict string	None	Contains the following configurable parameters * `type`: The name of the backbone to be used * `freeze`: A flag specifying whether to freeze or unfreeze the backbone * `pretrained`: The path to the pre-trained weights to be loaded Refer to the Foundation Models section for Foundation Models configuration.	FAN Variants fan_tiny_8_p4_hybrid, fan_small_12_p4_hybrid, fan_base_16_p4_hybrid, fan_large_16_p4_hybrid, fan_Xlarge_16_p4_hybrid, fan_base_18_p16_224, fan_tiny_12_p16_224, fan_small_12_p16_224, fan_large_24_p16_224 GCViT Variants gc_vit_xxtiny, gc_vit_xtiny, gc_vit_tiny, gc_vit_small, gc_vit_base, gc_vit_large, FasterViT Variants faster_vit_0_224, faster_vit_1_224, faster_vit_2_224, faster_vit_3_224, faster_vit_4_224, faster_vit_5_224, faster_vit_6_224, faster_vit_4_21k_224, faster_vit_4_21k_384, faster_vit_4_21k_512, faster_vit_4_21k_768 False/True
`head`	Dict	None	The config parameters for the classification head	–
`train_cfg`	Dict	None	Contains advanced augmentation parameters.	–

Foundation Models

Copy
Copied!

            
            model:
  backbone:
    type: "ViT-B-32"
    custom_args:
      drop_path_rate: 0.1
    freeze: False
    pretrained: laion400m_e31
  head:
    type: LinearClsHead
    num_classes: 1000
    in_channels: 512
    loss:
      type: CrossEntropyLoss
      loss_weight: 1.0
      use_soft: False
    topk: [1, 5]

Subset of the supported arch and the pre-train datasets. Please note that the in_channels should be updated under the head :

CLIP Image Backbones:

Arch	Pretrained Dataset	in_channels
`ViT-B-32`	* laion400m_e31 * laion400m_e32 * laion2b_e16 * laion2b_s34b_b79k * datacomp_m_s128m_b4k * laion2b_s34b_b79k * laion2b_s34b_b79k * laion2b_s34b_b79k * openai	512
`ViT-B-16`	laion400m_e31	512
`ViT-L-14`	laion400m_e31	768
`ViT-H-14`	laion2b_s32b_b79k	1024
`ViT-g-14`	laion2b_s12b_b42k	1024

EVA - CLIP Image Backbones:

Arch	Pretrained Dataset	in_channels
`EVA02-L-14`	merged2b_s4b_b131k	768
`EVA02-L-14-336`	laion400m_e31	768
`EVA02-E-14`	laion400m_e31	1024
`EVA02-E-14-plus`	laion2b_s32b_b79k	1024

head

Parameter	Datatype	Default	Description	Supported Values
`type`	string	None	Parameters for Beta distribution to generate the mixing ratio	TAOLinearClsHead, FANLinearClsHead, LogisticRegressionHead
`num_classes`	Dict	None	The number of training classes	>=0
`loss`	Dict	{“type”:”CrossEntropyLoss”}	Refer to losses for different types of loss and their parameters
`topk`	List	[1,]	The number of classes	>=0
`custom_args`	Dict	None	Any custom parameters to be passed to `head` (e.g.``head_init_scale`` is used for `TAOLinearClsHead`)	–
`lr_head`	Dict	None	Parameters used for Logistic Regression Head (e.g.``C`` is used for tuning the regularization strength)	–

lr_head

Logistic Regression head is defined by the following parameters:

lr_head

Dict

None

Contains the following tunable parameters:
C: Inverse of regularization strength
max_iter: Maximum number of iterations taken for the solvers to converge
class_weight: Whether to support balanced class training
solver: Algorithm to use in the optimization problem.

–
>=0.0
>0
‘balanced’, None
‘lbfgs’, ‘liblinear’, ‘newton-cg’, ‘newton-cholesky’, ‘sag’, ‘saga’

Logistic Regression head involves freezing the backbone weights. Please note that the freeze parameter should be updated to True under the backbone config as under:

Copy
Copied!

            
            model:
  backbone:
    type: "ViT-B-32"
    custom_args:
      drop_path_rate: 0.1
    freeze: True
    pretrained: laion400m_e31
  head:
    type: "LogisticRegressionHead"
    num_classes: 1000
    lr_head:
      C: 0.316
      max_iter: 5000

train_cfg

BatchMixup

Parameter	Datatype	Default	Description	Supported Values
`alpha`	string	None	Parameters for Beta distribution to generate the mixing ratio	0-1
`prob`	Dict	None	The probability at which to apply augmentation	0-1
`num_classes`	int	None	The number of classes	>=0

BatchCutMix

Parameter	Datatype	Default	Description	Supported Values
`alpha`	string	None	Parameters for Beta distribution to generate the mixing ratio	0-1
`prob`	Dict	None	The probability at which to apply the augmentation	0-1
`num_classes`	int	None	Number of classes	>=0

loss

Some Important Losses for classification losses are shown below. Please note that all supported losses in MMPretrain can be used by following the Hydra config for TAO Toolkit. For a list of MMPretrain losses, refer to the losses_mmpretrain documentation.

LabelSmoothLoss

Parameter	Datatype	Default	Description	Supported Values
`label_smooth_val`	string	None	The degree of label smoothing	0-1
`use_sigmoid`	bool	None	Specifies whether prediction should use the sigmoid of softmax	False/ True
`num_classes`	int	None	The number of classes	>=0
`mode`	string	None	Parameters for Beta distribution to generate the mixing ratio	0-1
`reduction`	str	None	The method used to reduce the loss	mean, sum
`loss_weight`	float	1.0	The weight of the loss	>=0

CrossEntropyLoss

Parameter	Datatype	Default	Description	Supported Values
`use_sigmoid`	bool	False	Specifies whether prediction should use the sigmoid of softmax	0-1
`use_soft`	bool	False	Specifies whether to use the soft version of CrossEntropyLoss	0-1
`loss_weight`	float	1.0	The weight of the loss	0-1
`reduction`	str	None	The method used to reduce the loss	mean, sum
`class_weight`	list[float]	None	The weight for each class	0-1

Training the model

Use the tao model classification_pyt train command to train a classification pytorch model:

Copy
Copied!

            
            tao model classification_pyt train [-h] -e <spec file>
                              -r <result directory>
                              [-g <num GPUs>]

Required Arguments

-r, --results_dir: The path to a folder where the experiment outputs should be written
-e, --experiment_spec_file: The path to the experiment spec file

Optional Arguments

-g, --gpus: The nubmer of GPUs to use for training. The default value is 1.
-h, --help: Print the help message.

Sample Usage

Here’s an example of using the tao model classification_pyt train command:

Copy
Copied!

            
            tao model classification_pyt train -e /workspace/cats_dogs/spec/train_cats_dogs.yaml -r /workspace/output

Evaluating the Model

The evaluate config defines the hyperparameters of the evaluation process. The following is an example config:

Copy
Copied!

            
            evaluate:
  checkpoint: /path/to/model.pth
  topk: 1

After the model has been trained using the experiment config file and by following the steps to train a model, the next step is to evaluate this model on a test set to measure the accuracy of the model. The TAO toolkit includes the tao model classification_pyt evaluate command to do this.

The classification app computes evaluation loss and Top-k accuracy.

After training, the model is stored in the output directory of your choice in results_dir.

Copy
Copied!

            
            evaluate:
  checkpoint: /path/to/model.pth

Copy
Copied!

            
            tao model classification_pyt evaluate [-h] -e <experiment_spec_file>
                                      evaluate.checkpoint=<model to be evaluated>
                                      results_dir=<path to results dir>
                                      [-g <num gpus>]

Required Arguments

-e, --experiment_spec_file: The path to the experiment spec file

Optional Arguments

-h, --help: Show this help message and exit.
-g, --gpus: The number of GPUs for conducting evaluation

If you followed the example in training a classification model, run the evaluation:

Copy
Copied!

            
            tao model classification_pyt evaluate -e /path/to/classification_eval.yaml

TAO will evaluate for classification and produces the Top-K accuracy metric.

Running Inference on a Model

For classification, tao model classification_pyt inference saves a .csv file containing the image paths and the corresponding labels for multiple images. TensorRT Python inference can also be enabled.

Copy
Copied!

            
            inference:
  checkpoint: /path/to/model.pth

Copy
Copied!

            
            tao model classification_pyt inference [-h] -e <experiment_spec_file>
                                      inference.checkpoint=<model to be inferenced>
                                      results_dir=<path to results dir>
                                      [-g <num gpus>]

Required Arguments

-e, --experiment_spec_file: The path to the experiment spec file

Optional Arguments

-h, --help: Show this help message and exit.
-g, --gpus: The number of GPUs to conduct the evaluation

Exporting the model

Exporting the model decouples the training process from inference and allows conversion to TensorRT engines outside the TAO environment. TensorRT engines are specific to each hardware configuration and should be generated for each unique inference environment. The exported model may be used universally across training and deployment hardware. The exported model format is referred to as .onnx.

The export parameter defines the hyperparameters of the export process.

Copy
Copied!

            
            export:
  checkpoint: /path/to/model.pth
  onnx_file: /path/to/model.onnx
  opset_version: 12
  verify: False
  input_channel: 3
  input_width: 224
  input_height: 224

Here’s an example of the tao classification_pyt export command:

Copy
Copied!

            
            tao model classification_pyt export [-h] -e <experiment spec file>
                  [-r <results_dir>]
                  export.checkpoint=<model to export>
                  export.onnx_file=<onnx path>

Required Arguments

-e, --experiment_spec: The path to an experiment spec file

Optional Arguments

-r, --results_dir: The directory where the inference result is stored
export.checkpoint: The .tlt or .pth model to export
export.onnx_file: The path where the .etlt or .onnx model will be saved

Sample Usage

The following is a sample export command.

Copy
Copied!

            
            tao model classification_pyt export -e /path/to/spec.yaml export.checkpoint=/path/to/model.pth export.onnx_file=/path/to/model.onnx

TensorRT Engine Generation, Validation, and INT8 Calibration

For TensorRT engine generation, validation, and INT8 calibration, refer to the TAO Deploy documentation.

Deploying to DeepStream

Refer to the Integrating a Classification (TF1/TF2/PyTorch) Model page for more information about deploying a classification model with DeepStream.