Image Classification PyT#
Image Classification PyT is a PyTorch-based image-classification model included in TAO. It supports the following tasks:
train
evaluate
inference
export
distill
All above actions follow below command pattern.
SPECS=$(tao-client classification_pyt get-spec --action <sub_task> --job_type experiment --id $EXPERIMENT_ID)
JOB_ID=$(tao-client classification_pyt experiment-run-action --action <sub_task> --id $EXPERIMENT_ID --specs "$SPECS")
Required Arguments
--id
: The unique identifier of the experiment from which to train the model
See also
For information on how to create an experiment using the FTMS client, refer to the Creating an experiment section in the Remote Client documentation.
tao model classification_pyt <sub_task> <args_per_subtask>
Where, args_per_subtask
are the command-line arguments required for a given subtask. Each
subtask is explained in detail in the following sections.
Preparing the Input Data Structure#
See the Data Annotation Format page for more information about the data format for image classification.
The train
classification experiment specification consists of seven main components:
model
dataset
train
evaluate
inference
export
distill
model#
Here is an example model-specification file for Image Classification PyT with a FAN backbone:
We first need to set the base_experiment.
FILTER_PARAMS='{"network_arch": "classification_pyt"}'
$BASE_EXPERIMENTS=$(tao-client classification_pyt list-base-experiments --filter_params "$FILTER_PARAMS")
Retrieve the PTM_ID for FAN backbone from $BASE_EXPERIMENTS before setting base_experiment.
PTM_INFORMATION="{\"base_experiment\": [$PTM_ID]}"
tao-client classification_pyt patch-artifact-metadata --id $EXPERIMENT_ID --job_type experiment --update_info $PTM_INFORMATION
Then retrieve the specifications.
TRAIN_SPECS=$(tao-client classification_pyt get-spec --action train --job_type experiment --id $EXPERIMENT_ID)
Get specifications from $TRAIN_SPECS. You can override values as needed.
model:
backbone:
type: "vit_large_patch14_dinov2_swiglu"
pretrained_backbone_path: <path_to_pretrained_weight>
freeze_backbone: True
head:
type: "TAOLinearClsHead"
binary: False
topk: [1, 5]
loss:
type: CrossEntropyLoss
The model
parameter primarily configures the backbone and head.
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
dict config |
– |
The configuration of the backbone. |
|
|
dict config |
– |
The configuration of the head. |
backbone#
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
type |
str
|
fan_small_12_p4_hybrid
|
Backbone architectures
|
FAN Variants
fan_tiny_8_p4_hybrid, fan_small_12_p4_hybrid,
fan_base_16_p4_hybrid, fan_large_16_p4_hybrid,
fan_Xlarge_16_p4_hybrid, fan_base_18_p16_224,
fan_tiny_12_p16_224, fan_small_12_p16_224,
fan_large_24_p16_224, fan_small_12_p16_224_se_attn
GCViT Variants
gc_vit_xxtiny, gc_vit_xtiny, gc_vit_tiny,
gc_vit_small, gc_vit_base, gc_vit_large,
FasterViT Variants
faster_vit_0_224, faster_vit_1_224,
faster_vit_2_224, faster_vit_3_224,
faster_vit_4_224, faster_vit_5_224,
faster_vit_6_224, faster_vit_4_21k_224,
faster_vit_4_21k_384, faster_vit_4_21k_512,
faster_vit_4_21k_768
NVCLIP Variants
ViT-H-14-SigLIP-CLIPA-224
ViT-L-14-SigLIP-CLIPA-336
ViT-L-14-SigLIP-CLIPA-224
NVDIONv2 Variants
vit_large_patch14_dinov2_swiglu
vit_giant_patch14_reg4_dinov2_swiglu
CRADIO Variants
c_radio_p1_vit_huge_patch16_mlpnorm
c_radio_p2_vit_huge_patch16_mlpnorm
c_radio_p3_vit_huge_patch16_mlpnorm
c_radio_v2_vit_base_patch16
c_radio_v2_vit_large_patch16
c_radio_v2_vit_huge_patch16
|
|
bool |
False |
Feature downsample for fan base backbone |
True,False |
|
str |
– |
Path to the pretrained model |
– |
|
bool |
False |
Flag to freeze backbone |
True,False |
Foundation Models#
Subset of the supported arch and the pre-train datasets. Please note that the in_channels
should be updated under the head :
NVCLIP Image Backbones:
Arch |
Pretrained Dataset |
in_channels |
---|---|---|
|
NVIDIA-commercial dataset |
1024 |
|
NVIDIA-commercial dataset |
768 |
|
NVIDIA-commercial dataset |
768 |
NVDINOv2 Image Backbones:
Arch |
Pretrained Dataset |
in_channels |
---|---|---|
|
NVIDIA-commercial dataset |
1024 |
|
NVIDIA-commercial dataset |
1536 |
RADIO Image Backbones:
Arch |
Pretrained Dataset |
in_channels |
---|---|---|
|
NVIDIA-commercial dataset |
3840 |
|
NVIDIA-commercial dataset |
5120 |
|
NVIDIA-commercial dataset |
3840 |
|
NVIDIA-commercial dataset |
2304 |
|
NVIDIA-commercial dataset |
3072 |
|
NVIDIA-commercial dataset |
3840 |
head#
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
string |
TAOLinearClsHead |
Type of classification head |
TAOLinearClsHead, LogisticRegressionHead |
|
bool |
False |
Flag to specify binary classification |
True,False |
|
int |
448 |
Number of backbone input channels to head |
– |
|
List |
[1,] |
The number of classes |
>=0 |
|
Dict config |
– |
Loss config |
– |
custom_args |
Dict
|
None
|
Any custom parameters to be passed to
head (e.g.``head_init_scale`` is used for
TAOLinearClsHead ) |
–
|
loss#
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
str |
CrossEntropyLoss |
Loss type. |
CrossEntropyLoss |
|
float |
0.0 |
Label smoothing value. |
Dataset Input for Classification PyT#
Here is an example of dataset specification file for classification PyT:
Note
For FTMS Client, these parameters are set in json format.
dataset:
dataset: "CLDataset"
root_dir: /dataset/imagenet2012
batch_size: 128
workers: 1
num_classes: 1000
img_size: 224
augmentation:
mixup_cutmix: True
random_flip:
vflip_probability: 0
hflip_probability: 0.5
enable: True
random_aug:
enable: True
random_erase:
enable: True
random_rotate:
rotate_probability: 0.5
angle_list: [90, 180, 270]
enable: False
random_color:
brightness: 0.4
contrast: 0.4
saturation: 0.4
enable: False
with_scale_random_crop:
enable: False
with_random_crop: True
with_random_blur: False
train_dataset:
images_dir: /dataset/imagenet2012/train
val_dataset:
images_dir: /dataset/imagenet2012/val
test_dataset:
images_dir: /dataset/imagenet2012/test
The table below describes the configurable parameters in dataset
.
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
str |
– |
Path to folder that contains classes.txt. |
|
|
str |
– |
dataset class. |
|
|
int |
– |
The number of classes in the training data. |
|
|
int |
– |
The input image size. |
|
|
int |
– |
Batch Size. |
|
|
int |
– |
Workers. |
|
|
bool |
– |
Shuffle dataloader. |
True,False |
|
dict config |
– |
Augmentation config. |
|
|
dict config |
– |
Configuration for the training dataset path. |
|
|
dict config |
– |
Train Data Dataclass. |
|
|
dict config |
– |
Configuration for the validation dataset path. |
|
|
dict config |
– |
Configuration for the testing dataset path. |
augmentation#
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
dict config |
– |
RandomFlip augmentation config. |
– |
|
dict config |
– |
RandomRotation augmentation config. |
– |
|
dict config |
– |
RandomColor augmentation config. |
– |
|
dict config |
– |
RandomErase augmentation config. |
– |
|
dict config |
– |
RandomAug augmentation config. |
– |
|
dict config |
– |
RandomCropWithScale augmentation config. |
– |
|
bool |
– |
Flag to enable with_random_blur. |
– |
|
bool |
– |
Flag to enable with_random_crop. |
– |
|
List[float] |
– |
Mean for the augmentation. |
– |
|
List[float] |
– |
Standard deviation for the augmentation. |
– |
|
bool |
False |
Flag to enable mixup and cutmix. Not recommended for binary classification. |
True,False |
|
float |
0.4 |
Mixup alpha. |
– |
RandomFlip#
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
float |
0.5 |
Vertical Flip probability. |
– |
|
float |
0.5 |
Horizontal Flip probability. |
– |
|
bool |
True |
Flag to enable augmentation. |
True,False |
RandomRotation#
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
float |
0.5 |
Random Rotate probability. |
– |
|
List[float] |
[90, 180, 270] |
Random rotate angle. |
– |
|
bool |
True |
Flag to enable augmentation. |
True,False |
RandomColor#
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
float |
0.3 |
Random Color Brightness. |
– |
|
float |
0.3 |
Random Color Contrast. |
– |
|
float |
0.3 |
Random Color Saturation. |
– |
|
float |
0.3 |
Random Color Hue. |
– |
|
bool |
True |
Flag to enable Random Color. |
True,False |
|
float |
0.5 |
Random Color Probability. |
– |
RandomCropWithScale#
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
float |
[1, 1.2] |
Random Scale range. |
– |
|
bool |
True |
Flag to enable augmentation. |
True,False |
RandomErase#
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
float |
0.2 |
Random Erase Probability. |
– |
|
bool |
True |
Flag to enable augmentation. |
True,False |
RandomAug#
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
bool |
True |
Flag to enable augmentation. |
True,False |
train_dataset#
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
str |
– |
Path to images directory for dataset. |
– |
val_dataset#
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
str |
– |
Path to images directory for dataset. |
– |
test_dataset#
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
str |
– |
Path to images directory for dataset. |
– |
train_nolabel#
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
Optional[str] |
– |
Dataset directory path. |
– |
train#
Here is an example of dataset specification file for classification PyT:
Note
For FTMS Client, these parameters are set in json format.
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
dict config |
– |
Optimizer config. |
– |
|
str |
None |
Pretrained model path. |
– |
|
dict config |
– |
Configuration for the tensorboard logger. |
– |
|
bool |
False |
Flag to enable EMA. |
True,False |
|
float |
0.998 |
EMA decay. |
– |
|
float |
2.0 |
Gradient Norm. |
– |
|
int |
1 |
The number of GPUs to run the train job. |
– |
|
List[int] |
[0] |
List of GPU IDs to run the training on. |
– |
|
int |
1 |
Number of nodes to run the training on. |
– |
|
int |
1234 |
The seed for the initializer in PyTorch. |
– |
|
int |
10 |
Number of epochs to run the training. |
– |
|
int |
1 |
Checkpoint interval. |
– |
|
int |
1 |
Validation interval. |
– |
|
str |
None |
Path to the checkpoint to resume training |
– |
|
str |
None |
Path to where all the assets are stored. |
– |
optim#
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
str |
val_loss |
Monitor Name |
– |
|
str |
adamw |
Optimizer |
adamw,adam,sgd |
|
float |
0.00006 |
Optimizer learning rate |
– |
|
str |
linear |
Optimizer policy |
linear,step,cosine,multistep |
|
Dict[str, Any] |
{“step_size”: 30, “gamma”: 0.1, “milestones”: [10, 20]} |
Optimizer policy parameters |
linear,step,cosine,multistep |
|
float |
0.9 |
The momentum for the AdamW optimizer. |
– |
|
float |
0.01 |
The weight decay coefficient. |
– |
|
List[float] |
[0.9, 0.999] |
coefficients used for computing running averages on adamw. |
– |
|
List[str] |
[] |
layers names which do not need weight decay. |
– |
|
int |
0 |
Warmup epochs. |
– |
tensorboard#
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
bool |
False |
Flag to enable tensorboard |
– |
|
int |
2 |
infrequent_logging_frequency |
– |
evaluate#
Here is an example of evaluate specification file for classification PyT:
Note
For FTMS Client, these parameters are set in json format.
evaluate:
checkpoint: /path/to/model.pth
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
int |
1 |
Visualize evaluation segmentation results after n batches. |
– |
|
int |
8 |
Batch Size. |
– |
|
str |
– |
Path to checkpoint file. |
– |
|
int |
1 |
The number of GPUs to run the evaluate job. |
– |
|
List[int] |
[0] |
List of GPU IDs to run the evaluate on. |
– |
|
int |
1 |
Number of nodes to run the evaluate on. |
– |
|
str |
– |
Path to the checkpoint used for evaluation. |
– |
|
Optional[str] |
None |
Path to the TensorRT engine to be used for evaluation. |
– |
|
Optional[str] |
None |
Path to where all the assets are stored. |
– |
inference#
The inference config contains the parameters related to training. They are described as follows:
Note
For FTMS Client, these parameters are set in json format.
inference:
checkpoint: ${results_dir}/train/model_latest.pth
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
int |
1 |
Visualize inference segmentation results after n batches. |
– |
|
int |
8 |
Batch Size. |
– |
|
str |
– |
Path to checkpoint file. |
– |
|
int |
1 |
The number of GPUs to run the inference job. |
– |
|
List[int] |
[0] |
List of GPU IDs to run the inference on. |
– |
|
int |
1 |
Number of nodes to run the inference on. |
– |
|
str |
– |
Path to the checkpoint used for inference. |
– |
|
Optional[str] |
None |
Path to the TensorRT engine to be used for inference. |
– |
|
Optional[str] |
None |
Path to where all the assets are stored. |
– |
export#
The export config contains the parameters related to export. They are described as follows:
Note
For FTMS Client, these parameters are set in json format.
export:
results_dir: "${results_dir}/export"
gpu_id: 0
checkpoint: ${results_dir}/train/model_latest.pth
onnx_file: "${export.results_dir}/model_latest.onnx"
input_width: 224
input_height: 224
batch_size: -1
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
Optional[str] |
None |
Path to where all the assets are stored. |
– |
|
int |
0 |
The index of the GPU to build the TensorRT engine. |
– |
|
str |
– |
Path to the checkpoint file to run export. |
– |
|
str |
– |
Path to the onnx model file. |
– |
|
bool |
False |
Flag to export CPU compatible model. |
True,False |
|
int |
3 |
Number of channels in the input Tensor. |
1,3 |
|
int |
960 |
Width of the input image tensor. |
– |
|
int |
544 |
Height of the input image tensor. |
– |
|
int |
17 |
Operator set version. |
– |
|
int |
-1 |
The batch size of the input Tensor for the engine. |
– |
distill#
The distill config contains the parameters related to distill. They are described as follows:
Note
For FTMS Client, these parameters are set in json format.
distill:
teacher:
backbone:
type: "vit_large_patch14_dinov2_swiglu"
pretrained_backbone_path: <pretrained_model_path>
freeze_backbone: True
pretrained_teacher_model_path: <pretrained_teacher_path>
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
Dict config |
– |
Configuration hyper parameters for the teacher model. |
– |
|
str |
KL |
Loss function for logits distillation. |
KL,CE,L1,L2 |
|
float |
0.5 |
The weight to be applied to the distillation loss as compared to task loss. |
– |
|
str |
– |
Path to the pre-trained teacher model. |
– |
|
str |
– |
Path to where all the assets generated from a task are stored. |
– |
teacher#
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
Dict config |
– |
Configuration parameters for Backbone |
– |
|
Dict config |
– |
Configuration parameters for Head |
– |
Training the model#
Use the tao model classification_pyt train
command to train a classification pytorch model:
TRAIN_JOB_ID=$(tao-client classification_pyt experiment-run-action --action train --id $EXPERIMENT_ID --specs "$TRAIN_SPECS")
tao model classification_pyt train [-h] -e <experiment_spec_file>
[results_dir=<global_results_dir>]
[model.<model_option>=<model_option_value>]
[dataset.<dataset_option>=<dataset_option_value>]
[train.<train_option>=<train_option_value>]
[train.gpu_ids=<gpu indices>]
[train.num_gpus=<number of gpus>]
Required Arguments
The only required argument is the path to the experiment spec:
-e, --experiment_spec
: The experiment specification file to set up the training experiment
Optional Arguments
You can set optional arguments to override the option values in the experiment spec file.
-h, --help
: Show this help message and exit.model.<model_option>
: The model options.dataset.<dataset_option>
: The dataset options.train.<train_option>
: The train options.train.optim.<optim_option>
: The optimizer options
Note
For training, evaluation, and inference, we expose 2 variables for each respective task: num_gpus
and gpu_ids
, which
default to 1
and [0]
, respectively. If both are passed, but inconsistent, for example num_gpus = 1
,
gpu_ids = [0, 1]`, then they are modified to follow the setting with more GPUs, for example num_gpus = 1 -> num_gpus = 2
.
In some cases, you may encounter an issue with multi-GPU training resulting in a segmentation fault. You may circumvent this by setting the OMP_NUM_THREADS enviroment variable to 1. Depending upon your model of execution, you may use the following methods to set this variable
CLI Launcher
You may set this env variable by adding the following fields to the Envs field of your ~/.tao_mounts.json
file as mentioned in bullet 3
in this section
{
"Envs": [
{
"variable": "OMP_NUM_THREADSR",
"value": "1"
}
]
}
Docker
You may set environment variables in the docker by setting the -e
flag in the docker command line.
docker run -it --rm --gpus all \
-e OMP_NUM_THREADS=1 \
-v /path/to/local/mount:/path/to/docker/mount nvcr.io/nvidia/tao/tao-toolkit:5.5.0-pyt <model> train -e
Evaluating the Model#
After the model has been trained using the experiment config file and by following the steps to
train a model, the next step is to evaluate this model on a test set to measure the
accuracy of the model. TAO includes the tao model classification_pyt evaluate
command to do this.
The classification app computes evaluation loss and Top-k accuracy.
After training, the model is stored in your FTMS experiment’s cloud workspace.
When using the TAO Launcher, it will be in the output directory of your choice results_dir
.
EVAL_JOB_ID=$(tao-client classification_pyt experiment-run-action --action evaluate --id $EXPERIMENT_ID --specs "$TRAIN_SPECS" --previsou_job_id=$TRAIN_JOB_ID)
The evaluate
config defines the hyperparameters of the evaluation process. The following is an example config:
evaluate:
checkpoint: /path/to/model.pth
tao model classification_pyt evaluate [-h] -e <experiment_spec>
evaluate.checkpoint=<model to be evaluated>
results_dir=<path to results dir>
[evaluate.<evaluate_option>=<evaluate_option_value>]
[evaluate.gpu_ids=<gpu indices>]
[evaluate.num_gpus=<number of gpus>]
Required Arguments
The following arguments are required.
-e, --experiment_spec
: The experiment spec file to set up the evaluation experimentevaluate.checkpoint
: The.pth
model to be evaluated.results_dir
: The path where the results will be stored
Optional Arguments
The following arguments are optional to run the command.
evaluate.<evaluate_option>
: The evaluate options.
Running Inference on a Model#
For classification, tao model classification_pyt inference
saves a .csv
file containing the image paths
and the corresponding labels for multiple images. TensorRT Python inference can also be enabled.
INFER_JOB_ID=$(tao-client classification_pyt experiment-run-action --action inference --id $EXPERIMENT_ID --specs "$INFER_SPECS" --previsou_job_id=$TRAIN_JOB_ID)
inference:
checkpoint: /path/to/model.pth
tao model classification_pyt inference [-h] -e <experiment_spec_file>
inference.checkpoint=<model to be inferenced>
results_dir=<path to results dir>
[inference.<inference_option>=<inference_option_value>]
[inference.gpu_ids=<gpu indices>]
[inference.num_gpus=<number of gpus>]
Required Arguments
The following arguments are required to run the command.
-e, --experiment_spec
: The experiment spec file to set up the inference experimentinference.checkpoint
: The.pth
model to inference.results_dir
: The path where the results will be stored
Optional Arguments
The following arguments are optional to run the command.
inference.<inference_option>
: The inference options.
Exporting the model#
Exporting the model decouples the training process from inference and allows conversion to
TensorRT engines outside the TAO environment. TensorRT engines are specific to each hardware
configuration and should be generated for each unique inference environment.
The exported model may be used universally across training and deployment hardware.
The exported model format is referred to as .onnx
.
EXPORT_JOB_ID=$(tao-client classification_pyt experiment-run-action --action export --id $EXPERIMENT_ID --specs "$EXPORT_SPECS" --previsou_job_id=$TRAIN_JOB_ID)
The export
parameter defines the hyperparameters of the export process.
export:
checkpoint: /path/to/model.pth
onnx_file: /path/to/model.onnx
opset_version: 12
verify: False
input_channel: 3
input_width: 224
input_height: 224
Here’s an example of the tao classification_pyt export
command:
tao model classification_pyt export [-h] -e <experiment spec file>
export.checkpoint=<model to export>
export.onnx_file=<onnx path>
[export.<export_option>=<export_option_value>]
Required Arguments
The following arguments are required to run the command.
-e, --experiment_spec
: The path to an experiment spec fileexport.checkpoint
: The.pth
model to export.export.onnx_file
: The path where the.etlt
or.onnx
model is saved.
Optional Arguments
The following arguments are optional to run the command.
export.<export_option>
: The export options.
TensorRT Engine Generation, Validation, and INT8 Calibration#
For TensorRT engine generation, validation, and INT8 calibration, refer to the TAO Deploy documentation.
Deploying to DeepStream#
Refer to the Integrating a Classification (TF1/TF2/PyTorch) Model page for more information about deploying a classification model with DeepStream.