Image Classification PyT

Image Classification PyT is a PyTorch-based image-classification model included in the TAO Toolkit. It supports the following tasks:

train
evaluate
inference
export

These tasks can be invoked from the TAO Toolkit Launcher using the following convention on the command-line:

Copy
Copied!

            
            tao model classification_pyt <sub_task> <args_per_subtask>

Where, args_per_subtask are the command-line arguments required for a given subtask. Each subtask is explained in detail in the following sections.

Note

Image Classification (PyT) is based off of MMClassification. Hence, most parameters are adopted from the MMClassification 0.x format. This version has been deprecated by MMLab and moved to MMPretrain. TAO Toolkit will be updated to the MMPretrain version in a future release.

Preparing the Input Data Structure

See the Data Annotation Format page for more information about the data format for image classification.

The train classification experiment specification consists of three main components:

dataset
train
model

Dataset Input for Classification PyT

Here is an example of dataset specification file for classification PyT with a FAN backbone:

Copy
Copied!

            
            dataset:
  data:
    samples_per_gpu: 128
    workers_per_gpu: 8
    train:
      data_prefix: "/raid/ImageNet2012/ImageNet2012/train"
      pipeline: # Augmentations alone
        - type: RandomResizedCrop
          size: 224
          backend: "pillow"
        - type: RandomFlip
          flip_prob: 0.5
          direction: "horizontal"
        - type: ColorJitter
          brightness: 0.4
          contrast: 0.4
          saturation: 0.4
        - type: RandomErasing
          erase_prob: 0.3
    val:
      data_prefix: /raid/ImageNet2012/ImageNet2012/val
    test:
      data_prefix: /raid/ImageNet2012/ImageNet2012/val

The table below describes the configurable parameters in dataset.

Parameter	Datatype	Default	Description	Supported Values
`sampler`	dict config	None	The dataset sampler type
`img_norm_cfg`	dict config float float bool	None [123.675, 116.28, 103.53] [58.395, 57.12, 57.375] False	Contains the following configurable parameters: * `mean`: The mean to be subtracted from image * `std`: The tandard deviation to divide the image * `to_rgb`: A flag specifying whether to convert to RGB format	> 0 - -
`data`	dict config	None	Parameters related to training. Refer to data for more details.

data

Parameter	Datatype	Default	Description	Supported Values
`samples_per_gpu`	int	None	The dataset sampler type
`img_norm_config`	str str bool	Dict [123.675, 116.28, 103.53] [58.395, 57.12, 57.375] False	Contains the following configurable parameters: * `mean`: The mean to be subtracted from the image * `std`: The standard deviation to divide the image * `to_rgb`: A flag specifying whether to convert to RGB format	> 0 - True/ False
`train`	dict config str str Dict str	None	Contains the training dataset configuration: * `data_prefix`: The parent folder containing folders of different classes * `ann_file`: A text file where every line is an image name and corresponding class ID. For more information, refer to the Data Annotation Format section. * `pipeline`: The data processing pipeline, which contains the pre-processing transforms. For more information, refer to the pipeline config * `classes`: A text file containing the classes (one class per line)	Imagenet Classes
`test`	dict config str str Dict str	None	Contains the test dataset configuration: * `data_prefix`: The parent folder containing folders of different classes * `ann_file`: A text file where every line is an image name and corresponding class ID. For more information, refer to the Data Annotation Format section. * pipeline: The data processing pipeline, which contains the pre-processing transoforms. For more information, refer to the pipeline config * classes: A text file containing the classes (one class per line)	Imagenet Classes
`val`	dict config str str Dict str	None	Contains the validation dataset configuration: * data_prefix: The parent folder containing folders of different classes * ann_file: A text file where every line is an image name and corresponding class ID. For more information, refer to the Data Annotation Format section. * pipeline: The data processing pipeline, which contains the pre-processing transoforms. For more information, refer to pipeline config * classes: A text file containing the classes (one class per line)	Imagenet Classes

Note

Refer to the MMClassification 0.x format documentation for more details.

pipeline

The following is an example pipeline config with different augmentations:

Copy
Copied!

            
            pipeline: # Augmentations alone
  - type: RandomResizedCrop
    size: 224
    backend: "pillow"
  - type: RandomFlip
    flip_prob: 0.5
    direction: "horizontal"
  - type: ColorJitter
    brightness: 0.4
    contrast: 0.4
    saturation: 0.4
  - type: RandomErasing
    erase_prob: 0.3

Some of the widely adopted augmentations and the parameters are listed below. More information, refer to the MMClassification documentation for transforms

Parameter

Datatype

Default

Description

Supported Values

RandomResizedCrop

dict config
int
int
str

None
int
int
bilinear

Contains the following configurable parameters:
* size: The desired output size of the crop
* crop_padding: The crop-padding parameter in efficientnet format
* interpolation: The interpolation method

> 0

-

RandomFlip

dict config
flip_prob
direction

None

Contains the following configurable parameters:
* flip_prob: The probability at which to flip the image
* direction: The flipping direction

0-1
horizontal,vertical

RandomCrop

dict config
int/ List
int

int
str

None

Contains the following configurable parameters:
* size: The desired output size of the crop
* padding: Optional padding on each imageborder. If a sequence of
length 4 is provided, it is used to pad left, top, right, bottom borders.
If a length of 2 is provided, it is used to pad left/right, top/bottom.

* pad_val: The pixel pad_val value for constant fill
* padding_mode: The padding type

> 0
> 0
> 0
> 0
> 0
constant, edge
reflect, symmetric

ColorJitter

dict config
float
float
float

None

The ColorJitter augmentation contains the following parameters:
* brightness: How much to jitter brightness
* contrast: How much to jitter contrast
* saturation: How much to jitter saturation

0-1
0-1
0-1

RandomErasing

dict config
float
float
float
str

None
0.5
0.02
0.4
const

The RandomErasing augmentation contains the following parameters:
* erase_prob: The probability that image will be randomly erased
* min_area_ratio: The minimum erased area divided by the input image area
* max_area_ratio: The maximum erased area divided by the input image area
* mode: The fill method in the erased area:
* const: All pixels are assigned with the same value
* rand: Each pixel is assigned with a random value in [0, 255]

0-1
0-1
0-1
const/ rand

train

Here is an example of a train specification file for Image Classification PyT:

Copy
Copied!

            
            train:
  train_config:
    runner:
      max_epochs: 300
    checkpoint_config:
      interval: 1
    logging:
      interval: 5000
    validate: True
    evaluation:
      interval: 10
    custom_hooks:
      - type: "EMAHook"
        momentum: 0.00008
        priority: "ABOVE_NORMAL"
    lr_config:
      policy: CosineAnnealingCooldown
      min_lr: 5.0e-06
      cool_down_time: 10
      warmup: 'linear'
      warmup_iters: 20
      warmup_by_epoch: True
    optimizer:
        type: AdamW
        lr: 0.005
        weight_decay: 0.05

The table below describes the configurable parameters in the train specification.

Parameter	Datatype	Default	Description	Supported Values
`exp_config`	dict config int str int	None 47 “127.0.0.1” 631	Contains the following configurable parameters: * `manual_seed`: The random seed of experiment * `MASTER_ADDR`: The host name of the Master Node * `MASTER_PORT`: The port on the MASTER_ADDR	> 0 - -
`train_config`	dict config	None	Parameters related to training. For more information, refer to train_config
`results_dir`	str	None	The path for saving the checkpoint and logs	str

train_config

Parameter	Datatype	Default	Description	Supported Values
`runner`	dict config int	None 20	Contains the following configurable parameters: * `max_epochs`: The maximum number of epochs for which the training should be conducted
`checkpoint_config`	dict config int	None 1	Contains the following configurable parameters: * `interval`: The number of steps at which the checkpoint needs to be saved Note that, currently, only Epoch Based Training is supported.	>0
`logging`	dict config int	None 10	Contains the following configurable parameters: * `interval`: The number of iterations at which the experiment logs need to be saved. The logs are saved in the `logs` directory in the output directory.	>0
`optimizer`	dict config	None	Contains the configurable parameters for different optimizers, as detailed in optimizer.
`optimizer_config`	dict config float	None None	Contains the following parameters: * `max_norm`: The max norm of the gradients	>=0.0
`evaluation`	dict config int	None int	Contains the following configurable parameters: * `interval` : The interval number of iterations at which validation should be performed during training
`validate`	bool	False	A flag that enables validation during training
`find_unused_parameters`	bool	False	Sets this parameter in DDP. For more information, refer DDP_PyT.
`lr_config`	dict	None	The learning-rate scheduler configuration. For more details, refer to lr_config
code load_from	str	None	The checkpoint path from where the end-end model weights including head can be loaded
code custom_hooks	dict	None	The custom training hooks configuration. For more details, refer to custom_hooks.
code resume_training_checkpoint_path	str	None	The checkpoint path to resume the training from

optimizer

The following optimizers are supported:

SGD

Parameter

Datatype

Default

Description

Supported Values

optimizer

dict config
str
float
float
float

None
None
None
0
0

Contains the following configurable parameters:
* type: “SGD”
* lr: The learning Rate
* momentum: The momentum factor
* weight_decay: The maximum number of epochs for which the training should be conducted

AdamW

Parameter

Datatype

Default

Description

Supported Values

optimizer

dict config
str
float
float
float

None
None
1e-3
0.0
1e-8

Contains the following configurable parameters:
* type: “AdamW”
* lr: The learning rate
* weight_decay: The weight decay (L2)
* eps: A term added to the denominator to improve numerical stability

lr_config

The lr_config parameter defines the parameters for the learning-rate scheduler. The following learning-rate schedulers are supported:

CosineAnnealingCooldown

Parameter	Datatype	Default	Description	Supported Values
`min_lr`	float	None	The minimum learning rate after annealing. The default value is None.	>=0.0
`min_lr_ratio`	float	None	The minimum learning ratio after annealing	Less than 1.0
`cool_down_ratio`	float	0.1	The cooldown ratio	In the interval (0, 1).
`cool_down_time`	int	10	The cooldown time	In the interval (0, 1).
`warmup`	string	exp	The type of warmup used	constant, linear, exp
`warmup_iters`	int	0	The number of iterations or epochs that warmup lasts	>=0.0
`warmup_ratio`	float	0.1	The learning rate used at the beginning of warmup equals `warmup_ratio * initial_lr`.	In the interval (0, 1).

CosineAnnealing

Parameter	Datatype	Typical value	Description	Supported Values
`warmup`	string	exp	Type of warmup used.	constant, linear, exp
`warmup_iters`	int	0	The number of iterations or epochs that warmup lasts	>=0.0
`warmup_ratio`	float	0.1	The learning rate used at the beginning of warmup equals `warmup_ratio * lr`.	In the interval (0, 1).
`min_lr_ratio`	float	None	The minimum learning ratio after annealing	Less than 1.0

Step

Parameter	Datatype	Typical value	Description	Supported Values
`gamma`	float	–	The base (maximum) learning rate	Usually less than 1.0
`step`	float	–	The ratio of the minimum learning rate to the base learning rate	Less than 1.0

Poly

Parameter	Datatype	Typical value	Description	Supported Values
`min_lr`	float	–	The base (maximum) learning rate	Usually less than 1.0
`power`	float	–	The ratio of the minimum learning rate to the base learning rate.	Less than 1.0
`soft_start`	float	–	The progress at which the learning rate achieves the base learning rate	In the interval (0, 1).

custom_hooks

The following is an example of how a custom hook from the MMCls to Hydra config is provided for EMAHook`:

MMClassification config:

Copy
Copied!

            
            custom_hooks = [
  dict(type='EMAHook', interval=100, priority='HIGH')]

Equivalent TAO Hydra config:

Copy
Copied!

            
            custom_hooks:
  - type: "EMAHook"
    momentum: 0.00008
    priority: "ABOVE_NORMAL"

For more detail on custom_hooks, refer to the MMClassification documentation for custom hooks.

model

Here is an example model-specification file for Image Classification PyT with a FAN backbone:

Copy
Copied!

            
            model:
  backbone:
    type: "fan_tiny_8_p4_hybrid"
    custom_args:
      drop_path_rate: 0.1
  head:
    type: "FANLinearClsHead"
    num_classes: 1000
    custom_args:
      head_init_scale: 1
    loss:
      type: LabelSmoothLoss
      label_smooth_val: 0.1
      mode: 'original'
  train_cfg:
    augments:
      - type: BatchMixup
        alpha: 0.8
        num_classes: 1000
        prob: 0.5
      - type: BatchCutMix
        alpha: 1.0
        num_classes: 1000
        prob: 0.5

The model parameter primarily configures the backbone and head.

Parameter	Datatype	Default	Description	Supported Values
`init_cfg`	Dict str str	None	The init_cfg contains the folllowing config parameters: * `checkpoint`: The path to the pre-trained model to be loaded * `prefix`: The string to be removed from state_dict keys
`backbone`	Dict string	None	Contains the following configurable parameters * `type`: The name of the backbone to be used	FAN Variants: fan_tiny_8_p4_hybrid, fan_small_12_p4_hybrid fan_base_16_p4_hybrid, fan_large_16_p4_hybrid fan_Xlarge_16_p4_hybrid, fan_base_18_p16_224 fan_tiny_12_p16_224, fan_small_12_p16_224 fan_large_24_p16_224 GCViT Variants gc_vit_xxtiny, gc_vit_xtiny, gc_vit_tiny gc_vit_small, gc_vit_base, gc_vit_large
`head`	Dict	None	The config parameters for the classification head	–
`train_cfg`	Dict	None	Contains advanced augmentation parameters.	–

head

Parameter	Datatype	Default	Description	Supported Values
`type`	string	None	Parameters for Beta distribution to generate the mixing ratio	LinearClsHead, FANLinearClsHead
`num_classes`	Dict	None	The number of training classes	>=0
`loss`	Dict	{“type”:”CrossEntropyLoss”}	Refer to losses for different types of loss and their parameters
`topk`	List	[1,]	The number of classes	>=0
`custom_args`	Dict	None	Any custom parameters to be passed to `head` (e.g.``head_init_scale`` is used for `FANLinearClsHead`)	–

train_cfg

BatchMixup

Parameter	Datatype	Default	Description	Supported Values
`alpha`	string	None	Parameters for Beta distribution to generate the mixing ratio	0-1
`prob`	Dict	None	The probability at which to apply augmentation	0-1
`num_classes`	int	None	The number of classes	>=0

BatchCutMix

Parameter	Datatype	Default	Description	Supported Values
`alpha`	string	None	Parameters for Beta distribution to generate the mixing ratio	0-1
`prob`	Dict	None	The probability at which to apply the augmentation	0-1
`num_classes`	int	None	Number of classes	>=0

loss

Some Important Losses for classification losses are shown below. Please note that all supported losses in MMCls can be used by following the Hydra config for TAO Toolkit. For a list of MMCls losses, refer to the losses_mmcls documentation.

LabelSmoothLoss

Parameter	Datatype	Default	Description	Supported Values
`label_smooth_val`	string	None	The degree of label smoothing	0-1
`use_sigmoid`	bool	None	Specifies whether prediction should use the sigmoid of softmax	False/ True
`num_classes`	int	None	The number of classes	>=0
`mode`	string	None	Parameters for Beta distribution to generate the mixing ratio	0-1
`reduction`	str	None	The method used to reduce the loss	mean, sum
`loss_weight`	float	1.0	The weight of the loss	>=0

CrossEntropyLoss

Parameter	Datatype	Default	Description	Supported Values
`use_sigmoid`	bool	False	Specifies whether prediction should use the sigmoid of softmax	0-1
`use_soft`	bool	False	Specifies whether to use the soft version of CrossEntropyLoss	0-1
`loss_weight`	float	1.0	The weight of the loss	0-1

Training the model

Use the tao model classification_pyt train command to train a classification pytorch model:

Copy
Copied!

            
            tao model classification_pyt train [-h] -e <spec file>
                              -r <result directory>
                              [-g <num GPUs>]

Required Arguments

-r, --results_dir: The path to a folder where the experiment outputs should be written
-e, --experiment_spec_file: The path to the experiment spec file

Optional Arguments

-g, --gpus: The nubmer of GPUs to use for training. The default value is 1.
-h, --help: Print the help message.

Sample Usage

Here’s an example of using the tao model classification_pyt train command:

Copy
Copied!

            
            tao model classification_pyt train -e /workspace/cats_dogs/spec/train_cats_dogs.yaml -r /workspace/output

Evaluating the Model

The evaluate config defines the hyperparameters of the evaluation process. The following is an example config:

Copy
Copied!

            
            evaluate:
  checkpoint: /path/to/model.pth
  topk: 1

After the model has been trained using the experiment config file and by following the steps to train a model, the next step is to evaluate this model on a test set to measure the accuracy of the model. The TAO toolkit includes the tao model classification_pyt evaluate command to do this.

The classification app computes evaluation loss and Top-k accuracy.

After training, the model is stored in the output directory of your choice in results_dir.

Copy
Copied!

            
            evaluate:
  checkpoint: /path/to/model.pth

Copy
Copied!

            
            tao model classification_pyt evaluate [-h] -e <experiment_spec_file>
                                      evaluate.checkpoint=<model to be evaluated>
                                      results_dir=<path to results dir>
                                      [-g <num gpus>]

Required Arguments

-e, --experiment_spec_file: The path to the experiment spec file

Optional Arguments

-h, --help: Show this help message and exit.
-g, --gpus: The number of GPUs for conducting evaluation

If you followed the example in training a classification model, run the evaluation:

Copy
Copied!

            
            tao model classification_pyt evaluate -e /path/to/classification_eval.yaml

TAO will evaluate for classification and produces the Top-K accuracy metric.

Running Inference on a Model

For classification, tao model classification_pyt inference saves a .csv file containing the image paths and the corresponding labels for multiple images. TensorRT Python inference can also be enabled.

Copy
Copied!

            
            inference:
  checkpoint: /path/to/model.pth

Copy
Copied!

            
            tao model classification_pyt inference [-h] -e <experiment_spec_file>
                                      inference.checkpoint=<model to be inferenced>
                                      results_dir=<path to results dir>
                                      [-g <num gpus>]

Required Arguments

-e, --experiment_spec_file: The path to the experiment spec file

Optional Arguments

-h, --help: Show this help message and exit.
-g, --gpus: The number of GPUs to conduct the evaluation

Exporting the model

Exporting the model decouples the training process from inference and allows conversion to TensorRT engines outside the TAO environment. TensorRT engines are specific to each hardware configuration and should be generated for each unique inference environment. The exported model may be used universally across training and deployment hardware. The exported model format is referred to as .onnx.

The export parameter defines the hyperparameters of the export process.

Copy
Copied!

            
            export:
  checkpoint: /path/to/model.pth
  onnx_file: /path/to/model.onnx
  opset_version: 12
  verify: False
  input_channel: 3
  input_width: 224
  input_height: 224

Here’s an example of the tao classification_pyt export command:

Copy
Copied!

            
            tao model classification_pyt export [-h] -e <experiment spec file>
                  [-r <results_dir>]
                  export.checkpoint=<model to export>
                  export.onnx_file=<onnx path>

Required Arguments

-e, --experiment_spec: The path to an experiment spec file

Optional Arguments

-r, --results_dir: The directory where the inference result is stored
export.checkpoint: The .tlt or .pth model to export
export.onnx_file: The path where the .etlt or .onnx model will be saved

Sample Usage

The following is a sample export command.

Copy
Copied!

            
            tao model classification_pyt export -e /path/to/spec.yaml export.checkpoint=/path/to/model.pth export.onnx_file=/path/to/model.onnx

TensorRT Engine Generation, Validation, and INT8 Calibration

For TensorRT engine generation, validation, and INT8 calibration, refer to the TAO Deploy documentation.

Deploying to DeepStream

Refer to the Integrating a Classification (TF1/TF2/PyTorch) Model page for more information about deploying a classification model with DeepStream.