ReIdentificationNet Transformer

ReIdentificationNet Transformer receives cropped images of a person from different perspectives as network input and outputs the embedding features for that person. The embeddings are used to perform similarity matching to re-identify the same person. The model is based on Swin Transformer, which is a general-purpose backbone for computer vision.

The ReIdentificationNet Transformer apps in TAO Toolkit expect data in Market-1501 format for training and evaluation.

Refer to the Data Annotation Format page for more information about the Market-1501 data format.

The spec file for ReIdentificationNet Transformer includes model, dataset, re_ranking, and train parameters. The following is an example spec for training a Swin Tiny model on Market-1501 with 751 identities in the training set.

Copy
Copied!
            

results_dir: "/path/to/experiment_results" encryption_key: nvidia_tao model: backbone: swin_tiny_patch4_window7_224 last_stride: 1 pretrain_choice: self pretrained_model_path: "/path/to/pretrained_model.pth" input_channels: 3 input_width: 128 input_height: 384 neck: bnneck stride_size: [16, 16] feat_dim: 1024 no_margin: True neck_feat: after metric_loss_type: triplet with_center_loss: False with_flip_feature: False label_smooth: False pretrain_hw_ratio: 2 dataset: train_dataset_dir: "/path/to/train_dataset_dir" test_dataset_dir: "/path/to/test_dataset_dir" query_dataset_dir: "/path/to/query_dataset_dir" num_classes: 751 batch_size: 64 val_batch_size: 128 num_workers: 8 pixel_mean: [0.5, 0.5, 0.5] pixel_std: [0.5, 0.5, 0.5] padding: 10 prob: 0.5 re_prob: 0.5 sampler: softmax_triplet num_instances: 4 re_ranking: re_ranking: True k1: 20 k2: 6 lambda_value: 0.3 train: results_dir: "${results_dir}/train" optim: name: SGD lr_steps: [40, 70] gamma: 0.1 bias_lr_factor: 2 weight_decay: 0.0001 weight_decay_bias: 0.0001 warmup_factor: 0.01 warmup_epochs: 20 warmup_method: cosine base_lr: 0.0008 momentum: 0.9 center_loss_weight: 0.0005 center_lr: 0.5 triplet_loss_margin: 0.3 large_fc_lr: False num_epochs: 120 checkpoint_interval: 10

Parameter Data Type Default Description
model dict config The configuration for the model architecture
train dict config The configuration for the training process
dataset dict config The configuration for the dataset
re_ranking dict config The configuration for the re-ranking module

model

The model parameter provides options to change the ReIdentificationNet Transformer architecture.

Copy
Copied!
            

model: backbone: swin_tiny_patch4_window7_224 last_stride: 1 pretrain_choice: self pretrained_model_path: "/path/to/pretrained_model.pth" input_channels: 3 input_width: 128 input_height: 384 neck: bnneck stride_size: [16, 16] feat_dim: 1024 no_margin: True neck_feat: after metric_loss_type: triplet with_center_loss: False with_flip_feature: False label_smooth: False pretrain_hw_ratio: 2

Parameter Datatype Default Description Supported Values
backbone string swin_tiny_patch4_window7_224 The type of model, which can be Swin-based architectures or resnet_50 (please refer to ReIdentificationNet) resnet_50/swin_base_patch4_window7_224/swin_small_patch4_window7_224/swin_tiny_patch4_window7_224
last_stride unsigned int 1 The number of strides during convolution >0
pretrain_choice string self Specifies the pre-trained network self/imagenet/””
pretrained_model_path string The path to the pre-trained model
input_channels unsigned int 3 The number of input channels >0
input_width int 128 The width of the input images >0
input_height int 384 The height of the input images >0
neck string bnneck Specifies whether to train with BNNeck bnneck/””
feat_dim unsigned int 1024 The output size of the feature embeddings >0
no_margin bool True A flag specifying whether to train with soft triplet loss True/False
neck_feat string after Specifies which feature of BNNeck to use for testing before/after
metric_loss_type string triplet The type of metric loss triplet/center/triplet_center
with_center_loss bool False A flag specifying whether to enable center loss True/False
with_flip_feature bool False A flag specifying whether to enable image flipping True/False
label_smooth bool False A flag specifying whether to enable label smoothing True/False
pretrain_hw_ratio float 2 The height-width ratio of the pre-trained model >0

dataset

The dataset parameter defines the dataset source, training batch size, and augmentation.

Copy
Copied!
            

dataset: train_dataset_dir: "/path/to/train_dataset_dir" test_dataset_dir: "/path/to/test_dataset_dir" query_dataset_dir: "/path/to/query_dataset_dir" num_classes: 751 batch_size: 64 val_batch_size: 128 num_workers: 8 pixel_mean: [0.5, 0.5, 0.5] pixel_std: [0.5, 0.5, 0.5] padding: 10 prob: 0.5 re_prob: 0.5 sampler: softmax_triplet num_instances: 4

Parameter Datatype Default Description Supported Values
train_dataset_dir string The path to the train images
test_dataset_dir string The path to the test images
query_dataset_dir string The path to the query images
num_classes unsigned int 751 The number of unique person IDs >0
batch_size unsigned int 64 The batch size for training >0
val_batch_size unsigned int 128 The batch size for validation >0
num_workers unsigned int 8 The number of parallel workers processing data >0
pixel_mean float list [0.5, 0.5, 0.5] The pixel mean for image normalization float list
pixel_std float list [0.5, 0.5, 0.5] The pixel standard deviation for image normalization float list
padding unsigned int 10 The pixel padding size around images for image augmentation >=1
prob float 0.5 The random horizontal flipping probability for image augmentation >0
re_prob float 0.5 The random erasing probability for image augmentation >0
sampler string softmax_triplet The type of sampler for data loading softmax/triplet/softmax_triplet
num_instances unsigned int 4 The number of image instances of the same person in a batch >0

re_ranking

The re_ranking parameter defines the settings for the re-ranking module.

Copy
Copied!
            

re_ranking: re_ranking: True k1: 20 k2: 6 lambda_value: 0.3

Parameter Datatype Default Description Supported Values
re_ranking bool True A flag that enables the re-ranking module True/False
k1 unsigned int 20 The k used for k-reciprocal nearest neighbors >0
k2 unsigned int 6 The k used for local query expansion >0
lambda_value float 0.3 The weight of the original distance in combination with the Jaccard distance >0.0

train

The train parameter defines the hyperparameters of the training process.

Copy
Copied!
            

train: optim: name: SGD lr_steps: [40, 70] gamma: 0.1 bias_lr_factor: 2 weight_decay: 0.0001 weight_decay_bias: 0.0001 warmup_factor: 0.01 warmup_epochs: 20 warmup_method: cosine base_lr: 0.0008 momentum: 0.9 center_loss_weight: 0.0005 center_lr: 0.5 triplet_loss_margin: 0.3 large_fc_lr: False num_epochs: 120 checkpoint_interval: 10

Parameter Datatype Default Description Supported Values
optim dict config The configuration for the SGD optimizer, including the learning rate, learning scheduler, weight decay, etc.
num_epochs unsigned int 120 The total number of epochs to run the experiment >0
checkpoint_interval unsigned int 10 The interval at which the checkpoints are saved >0
clip_grad_norm float 0.0 The amount to clip the gradient by the L2 norm. A value of 0.0 specifies no clipping. >=0

optim

The optim parameter defines the config for the SGD optimizer in training, including the learning rate, learning scheduler, and weight decay.

Copy
Copied!
            

optim: name: SGD lr_steps: [40, 70] gamma: 0.1 bias_lr_factor: 2 weight_decay: 0.0001 weight_decay_bias: 0.0001 warmup_factor: 0.01 warmup_epochs: 20 warmup_method: cosine base_lr: 0.0008 momentum: 0.9 center_loss_weight: 0.0005 center_lr: 0.5 triplet_loss_margin: 0.3 large_fc_lr: False

Parameter

Datatype

Default

Description

Supported Values

name string SGD The name of the optimizer Adam/SGD/Adamax/…
lr_steps int list [40, 70] The steps to decrease the learning rate for the MultiStep scheduler int list
gamma float 0.1 The decay rate for WarmupMultiStepLR >0.0
bias_lr_factor float 2 The bias learning rate factor for WarmupMultiStepLR >=1
weight_decay float 0.0001 The weight decay coefficient for the optimizer >0.0
weight_decay_bias float 0.0001 The weight decay bias for the optimizer >0.0
warmup_factor float 0.01 The warmup factor for the WarmupMultiStepLR scheduler >0.0
warmup_epochs unsigned int 20 The number of warmup epochs for the WarmupMultiStepLR scheduler >0
warmup_method string cosine The warmup method for the optimizer cosine/linear
base_lr float 0.0008 The initial learning rate for the training >0.0
momentum float 0.9 The momentum for the WarmupMultiStepLR optimizer >0.0
center_loss_weight float 0.0005 The balanced weight for center loss >0.0
center_lr float 0.5 The learning rate for SGD to learn the centers of center loss >0.0
triplet_loss_margin float 0.3 The margin value for triplet loss >0.0
large_fc_lr bool False A flag specifying whether to enable large fully connected learning rate True/False

Use the following command to run ReIdentificationNet Transformer training:

Copy
Copied!
            

tao model re_identification train -e <experiment_spec_file> -r <results_dir> -k <key> [train.gpu_ids=<gpu id list>]

Required Arguments

  • -e, --experiment_spec_file: The path to the experiment spec file

  • -r, --results_dir: The path to a folder where the experiment outputs should be written

  • -k, --key: The user-specific encoding key to save or load a .tlt model

Optional Arguments

  • train.gpu_ids: A list of GPU indices to use for training. If you set more than one GPU ID, multi-GPU training will be triggered automatically.

Here’s an example of using the ReIdentificationNet Transformer training command:

Copy
Copied!
            

tao model re_identification train -e $DEFAULT_SPEC -r $RESULTS_DIR -k $KEY

The evaluation metrics for ReIdentificationNet Transformer are the mean average precision and ranked accuracy. The plots of sampled matches and the cumulative matching characteristic (CMC) curve can be obtained using the evaluate.output_sampled_matches_plot and evaluate.output_cmc_curve_plot parameters, respectively.

Use the following command to run ReIdentificationNet Transformer evaluation:

Copy
Copied!
            

tao model re_identification evaluate -e <experiment_spec_file> -r <results_dir> -k <key> evaluate.checkpoint=<model to be evaluated> evaluate.output_sampled_matches_plot=<path to the output sampled matches plot> evaluate.output_cmc_curve_plot=<path to the output CMC curve plot> evaluate.test_dataset=<path to test data> evaluate.query_dataset=<path to query data> [evaluate.gpu_id=<gpu index>]

Required Arguments

  • -e, --experiment_spec_file: The experiment spec file to set up the evaluation experiment

  • -r, --results_dir: The path to a folder where the experiment outputs should be written

  • -k, --key: The encoding key for the .tlt model

  • evaluate.checkpoint: The .tlt model

  • evaluate.output_sampled_matches_plot: The path to the plotted file of sampled matches

  • evaluate.output_cmc_curve_plot: The path to the plotted file of the CMC curve

  • evaluate.test_dataset: The path to the test data

  • evaluate.query_dataset: The path to the query data

Optional Argument

  • evaluate.gpu_id: The GPU index used to run evaluation (when the machine has multiple GPUs installed). Note that evaluation can only run on a single GPU.

Here’s an example of using the ReIdentificationNet Transformer evaluation command:

Copy
Copied!
            

tao model re_identification evaluate -e $DEFAULT_SPEC -r $RESULTS_DIR -k $KEY evaluate.checkpoint=$TRAINED_TLT_MODEL evaluate.output_sampled_matches_plot=$OUTPUT_SAMPLED_MATCHED_PLOT evaluate.output_cmc_curve_plot=$OUTPUT_CMC_CURVE_PLOT evaluate.test_dataset=$TEST_DATA evaluate.query_dataset=$QUERY_DATA

Use the following command to run inference on ReIdentificationNet Transformer with the .tlt model.

Copy
Copied!
            

tao model re_identification inference -e <experiment_spec> -r <results_dir> -k <key> inference.checkpoint=<inference model> inference.output_file=<path to output file> inference.test_dataset=<path to gallery data> inference.query_dataset=<path to query data> [inference.gpu_id=<gpu index>]

The output will be a JSON file that contains the feature embeddings of all the test and query data.

Required Arguments

  • -e, --experiment_spec: The experiment spec file to set up inference

  • -r, --results_dir: The path to a folder where the experiment outputs should be written

  • -k, --key: The encoding key for the .tlt model

  • inference.checkpoint: The .tlt model to perform inference with

  • inference.output_file: The path to the output JSON file

  • inference.test_dataset: The path to the test data

  • inference.query_dataset: The path to the query data

Optional Argument

  • inference.gpu_id: The index of the GPU that will be used to run inference (when the machine has multiple GPUs installed). Note that inference can only run on a single GPU.

Here’s an example of using the ReIdentificationNet Transformer inference command:

Copy
Copied!
            

tao model re_identification inference -e $DEFAULT_SPEC -r $RESULTS_DIR -k $KEY inference.checkpoint=$TRAINED_TLT_MODEL inference.output_file=$OUTPUT_FILE inference.test_dataset=$TEST_DATA inference.query_dataset=$QUERY_DATA

The expected output is as follows:

Copy
Copied!
            

[ { "img_path": "/path/to/img1.jpg", "embedding": [-0.30, 0.12, 0.13,...] }, { "img_path": "/path/to/img2.jpg", "embedding": [-0.10, -0.06, -1.85,...] }, ... { "img_path": "/path/to/imgN.jpg", "embedding": [1.41, 0.63, -0.15,...] } ]

Use the following command to export ReIdentificationNet Transformer to .onnx format for deployment:

Copy
Copied!
            

tao model re_identification export -e <experiment_spec> -r <results_dir> -k <key> export.checkpoint=<tlt checkpoint to be exported> [export.onnx_file=<path to exported file>] [export.gpu_id=<gpu index>]

Required Arguments

  • -e, --experiment_spec: The experiment spec file to configure export

  • -r, --results_dir: The path to a folder where the experiment outputs should be written

  • -k, --key: The encoding key for the .tlt model

  • export.checkpoint: The .tlt model to be exported

Optional Arguments

  • export.onnx_file: The path to save the exported model to. The default path is in the same directory as the \*.tlt model.

  • export.gpu_id: The index of the GPU that will be used to run the export (when the machine has multiple GPUs installed). Note that export can only run on a single GPU.

Here’s an example of using the ReIdentificationNet Transformer export command:

Copy
Copied!
            

tao model re_identification export -e $DEFAULT_SPEC -r $RESULTS_DIR -k $KEY export.checkpoint=$TRAINED_TLT_MODEL

You can deploy the trained deep-learning and computer-vision models on edge devices–such as a Jetson Xavier, Jetson Nano, or Tesla–or in the cloud with NVIDIA GPUs. The exported \*.onnx model can also be used with TAO Toolkit Triton Apps.

Previous ReIdentificationNet Transformer
Next Pose Classification
© Copyright 2024, NVIDIA. Last updated on Mar 22, 2024.