ReIdentificationNet Transformer#

ReIdentificationNet Transformer receives cropped images of a person from different perspectives as network input and outputs the embedding features for that person. The embeddings are used to perform similarity matching to re-identify the same person. The model is based on Swin Transformer, which is a general-purpose backbone for computer vision.

The expected time to train ReIdentificationNet Transformer model is as follows:

Backbone Type

GPU Type

No. of training images

Image Size

No. of identities

Batch size

Total Epochs

Total Training Time

Swin Tiny

1 x Nvidia A100 - 80GB PCIE

13,000

256x128x3

751

128

120

~1.5 hours

Swin Tiny

1 x Nvidia Quadro GV100 - 32GB

13,000

256x128x3

751

64

120

~3 hours

Data Input for ReIdentificationNet Transformer#

The ReIdentificationNet Transformer apps in TAO expect data in Market-1501 format for training and evaluation.

Refer to the Data Annotation Format page for more information about the Market-1501 data format.

Creating an Experiment Spec File#

The spec file for ReIdentificationNet Transformer includes model, dataset, re_ranking, and train parameters. The following is an example spec for training a Swin Tiny model on Market-1501 with 751 identities in the training set.

results_dir: "/path/to/experiment_results"
encryption_key: nvidia_tao
model:
  backbone: swin_tiny_patch4_window7_224
  last_stride: 1
  pretrain_choice: self
  pretrained_model_path: "/path/to/pretrained_model.pth"
  input_channels: 3
  input_width: 128
  input_height: 384
  neck: bnneck
  stride_size: [16, 16]
  feat_dim: 1024
  no_margin: True
  neck_feat: after
  metric_loss_type: triplet
  with_center_loss: False
  with_flip_feature: False
  label_smooth: False
  pretrain_hw_ratio: 2
dataset:
  train_dataset_dir: "/path/to/train_dataset_dir"
  test_dataset_dir: "/path/to/test_dataset_dir"
  query_dataset_dir: "/path/to/query_dataset_dir"
  num_classes: 751
  batch_size: 64
  val_batch_size: 128
  num_workers: 8
  pixel_mean: [0.5, 0.5, 0.5]
  pixel_std: [0.5, 0.5, 0.5]
  padding: 10
  prob: 0.5
  re_prob: 0.5
  sampler: softmax_triplet
  num_instances: 4
re_ranking:
  re_ranking: True
  k1: 20
  k2: 6
  lambda_value: 0.3
train:
  results_dir: "${results_dir}/train"
  optim:
    name: SGD
    lr_steps: [40, 70]
    gamma: 0.1
    bias_lr_factor: 2
    weight_decay: 0.0001
    weight_decay_bias: 0.0001
    warmup_factor: 0.01
    warmup_epochs: 20
    warmup_method: cosine
    base_lr: 0.0008
    momentum: 0.9
    center_loss_weight: 0.0005
    center_lr: 0.5
    triplet_loss_margin: 0.3
    large_fc_lr: False
  num_epochs: 120
  checkpoint_interval: 10

Parameter

Data Type

Default

Description

model

dict config

The configuration for the model architecture

train

dict config

The configuration for the training process

dataset

dict config

The configuration for the dataset

re_ranking

dict config

The configuration for the re-ranking module

model#

The model parameter provides options to change the ReIdentificationNet Transformer architecture.

model:
  backbone: swin_tiny_patch4_window7_224
  last_stride: 1
  pretrain_choice: self
  pretrained_model_path: "/path/to/pretrained_model.pth"
  input_channels: 3
  input_width: 128
  input_height: 384
  neck: bnneck
  stride_size: [16, 16]
  feat_dim: 1024
  no_margin: True
  neck_feat: after
  metric_loss_type: triplet
  with_center_loss: False
  with_flip_feature: False
  label_smooth: False
  pretrain_hw_ratio: 2

Parameter

Datatype

Default

Description

Supported Values

backbone

string

swin_tiny_patch4_window7_224

The type of model, which can be Swin-based architectures or resnet_50 (please refer to ReIdentificationNet)

resnet_50/swin_base_patch4_window7_224/swin_small_patch4_window7_224/swin_tiny_patch4_window7_224

last_stride

unsigned int

1

The number of strides during convolution

>0

pretrain_choice

string

self

Specifies the pre-trained network

self/imagenet/””

pretrained_model_path

string

The path to the pre-trained model

input_channels

unsigned int

3

The number of input channels

>0

input_width

int

128

The width of the input images

>0

input_height

int

384

The height of the input images

>0

neck

string

bnneck

Specifies whether to train with BNNeck

bnneck/””

feat_dim

unsigned int

1024

The output size of the feature embeddings

>0

no_margin

bool

True

A flag specifying whether to train with soft triplet loss

True/False

neck_feat

string

after

Specifies which feature of BNNeck to use for testing

before/after

metric_loss_type

string

triplet

The type of metric loss

triplet/center/triplet_center

with_center_loss

bool

False

A flag specifying whether to enable center loss

True/False

with_flip_feature

bool

False

A flag specifying whether to enable image flipping

True/False

label_smooth

bool

False

A flag specifying whether to enable label smoothing

True/False

pretrain_hw_ratio

float

2

The height-width ratio of the pre-trained model

>0

dataset#

The dataset parameter defines the dataset source, training batch size, and augmentation.

dataset:
  train_dataset_dir: "/path/to/train_dataset_dir"
  test_dataset_dir: "/path/to/test_dataset_dir"
  query_dataset_dir: "/path/to/query_dataset_dir"
  num_classes: 751
  batch_size: 64
  val_batch_size: 128
  num_workers: 8
  pixel_mean: [0.5, 0.5, 0.5]
  pixel_std: [0.5, 0.5, 0.5]
  padding: 10
  prob: 0.5
  re_prob: 0.5
  sampler: softmax_triplet
  num_instances: 4

Parameter

Datatype

Default

Description

Supported Values

train_dataset_dir

string

The path to the train images

test_dataset_dir

string

The path to the test images

query_dataset_dir

string

The path to the query images

num_classes

unsigned int

751

The number of unique person IDs

>0

batch_size

unsigned int

64

The batch size for training

>0

val_batch_size

unsigned int

128

The batch size for validation

>0

num_workers

unsigned int

8

The number of parallel workers processing data

>0

pixel_mean

float list

[0.5, 0.5, 0.5]

The pixel mean for image normalization

float list

pixel_std

float list

[0.5, 0.5, 0.5]

The pixel standard deviation for image normalization

float list

padding

unsigned int

10

The pixel padding size around images for image augmentation

>=1

prob

float

0.5

The random horizontal flipping probability for image augmentation

>0

re_prob

float

0.5

The random erasing probability for image augmentation

>0

sampler

string

softmax_triplet

The type of sampler for data loading

softmax/triplet/softmax_triplet

num_instances

unsigned int

4

The number of image instances of the same person in a batch

>0

re_ranking#

The re_ranking parameter defines the settings for the re-ranking module.

re_ranking:
  re_ranking: True
  k1: 20
  k2: 6
  lambda_value: 0.3

Parameter

Datatype

Default

Description

Supported Values

re_ranking

bool

True

A flag that enables the re-ranking module

True/False

k1

unsigned int

20

The k used for k-reciprocal nearest neighbors

>0

k2

unsigned int

6

The k used for local query expansion

>0

lambda_value

float

0.3

The weight of the original distance in combination with the Jaccard distance

>0.0

train#

The train parameter defines the hyperparameters of the training process.

train:
  optim:
    name: SGD
    lr_steps: [40, 70]
    gamma: 0.1
    bias_lr_factor: 2
    weight_decay: 0.0001
    weight_decay_bias: 0.0001
    warmup_factor: 0.01
    warmup_epochs: 20
    warmup_method: cosine
    base_lr: 0.0008
    momentum: 0.9
    center_loss_weight: 0.0005
    center_lr: 0.5
    triplet_loss_margin: 0.3
    large_fc_lr: False
  num_epochs: 120
  checkpoint_interval: 10

Parameter

Datatype

Default

Description

Supported Values

optim

dict config

The configuration for the SGD optimizer, including the learning rate, learning scheduler, weight decay, etc.

num_epochs

unsigned int

120

The total number of epochs to run the experiment

>0

checkpoint_interval

unsigned int

10

The interval at which the checkpoints are saved

>0

clip_grad_norm

float

0.0

The amount to clip the gradient by the L2 norm. A value of 0.0 specifies no clipping.

>=0

optim#

The optim parameter defines the config for the SGD optimizer in training, including the learning rate, learning scheduler, and weight decay.

optim:
  name: SGD
  lr_steps: [40, 70]
  gamma: 0.1
  bias_lr_factor: 2
  weight_decay: 0.0001
  weight_decay_bias: 0.0001
  warmup_factor: 0.01
  warmup_epochs: 20
  warmup_method: cosine
  base_lr: 0.0008
  momentum: 0.9
  center_loss_weight: 0.0005
  center_lr: 0.5
  triplet_loss_margin: 0.3
  large_fc_lr: False

Parameter

Datatype

Default

Description

Supported Values

name

string

SGD

The name of the optimizer

Adam/SGD/Adamax/…

lr_steps

int list

[40, 70]

The steps to decrease the learning rate for the MultiStep scheduler

int list

gamma

float

0.1

The decay rate for WarmupMultiStepLR

>0.0

bias_lr_factor

float

2

The bias learning rate factor for WarmupMultiStepLR

>=1

weight_decay

float

0.0001

The weight decay coefficient for the optimizer

>0.0

weight_decay_bias

float

0.0001

The weight decay bias for the optimizer

>0.0

warmup_factor

float

0.01

The warmup factor for the WarmupMultiStepLR scheduler

>0.0

warmup_epochs

unsigned int

20

The number of warmup epochs for the WarmupMultiStepLR scheduler

>0

warmup_method

string

cosine

The warmup method for the optimizer

cosine/linear

base_lr

float

0.0008

The initial learning rate for the training

>0.0

momentum

float

0.9

The momentum for the WarmupMultiStepLR optimizer

>0.0

center_loss_weight

float

0.0005

The balanced weight for center loss

>0.0

center_lr

float

0.5

The learning rate for SGD to learn the centers of center loss

>0.0

triplet_loss_margin

float

0.3

The margin value for triplet loss

>0.0

large_fc_lr

bool

False

A flag specifying whether to enable large fully connected learning rate

True/False

Training the Model#

Use the following command to run ReIdentificationNet Transformer training:

tao model re_identification train -e <experiment_spec_file>
                            results_dir=<results_dir>
                            [train.gpu_ids=<gpu id list>]

Required Arguments#

  • -e, --experiment_spec_file: The path to the experiment spec file

  • results_dir: The path to a folder where the experiment outputs should be written

Optional Arguments#

  • train.gpu_ids: A list of GPU indices to use for training. If you set more than one GPU ID, multi-GPU training will be triggered automatically.

Here’s an example of using the ReIdentificationNet Transformer training command:

tao model re_identification train -e $DEFAULT_SPEC results_dir=$RESULTS_DIR

Evaluating the Model#

The evaluation metrics for ReIdentificationNet Transformer are the mean average precision and ranked accuracy. The plots of sampled matches and the cumulative matching characteristic (CMC) curve can be obtained using the evaluate.output_sampled_matches_plot and evaluate.output_cmc_curve_plot parameters, respectively.

Use the following command to run ReIdentificationNet Transformer evaluation:

tao model re_identification evaluate -e <experiment_spec_file>
                               results_dir=<results_dir>
                               evaluate.checkpoint=<model to be evaluated>
                               evaluate.output_sampled_matches_plot=<path to the output sampled matches plot>
                               evaluate.output_cmc_curve_plot=<path to the output CMC curve plot>
                               evaluate.test_dataset=<path to test data>
                               evaluate.query_dataset=<path to query data>
                               [evaluate.gpu_id=<gpu index>]

Required Arguments#

  • -e, --experiment_spec_file: The experiment spec file to set up the evaluation experiment

  • results_dir: The path to a folder where the experiment outputs should be written

  • evaluate.checkpoint: The .tlt model

  • evaluate.output_sampled_matches_plot: The path to the plotted file of sampled matches

  • evaluate.output_cmc_curve_plot: The path to the plotted file of the CMC curve

  • evaluate.test_dataset: The path to the test data

  • evaluate.query_dataset: The path to the query data

Optional Argument#

  • evaluate.gpu_id: The GPU index used to run evaluation (when the machine has multiple GPUs installed). Note that evaluation can only run on a single GPU.

Here’s an example of using the ReIdentificationNet Transformer evaluation command:

tao model re_identification evaluate -e $DEFAULT_SPEC results_dir=$RESULTS_DIR evaluate.checkpoint=$TRAINED_TLT_MODEL evaluate.output_sampled_matches_plot=$OUTPUT_SAMPLED_MATCHED_PLOT evaluate.output_cmc_curve_plot=$OUTPUT_CMC_CURVE_PLOT evaluate.test_dataset=$TEST_DATA evaluate.query_dataset=$QUERY_DATA

Running Inference on the Model#

Use the following command to run inference on ReIdentificationNet Transformer with the .tlt model.

tao model re_identification inference -e <experiment_spec>
                                results_dir=<results_dir>
                                inference.checkpoint=<inference model>
                                inference.output_file=<path to output file>
                                inference.test_dataset=<path to gallery data>
                                inference.query_dataset=<path to query data>
                                [inference.gpu_id=<gpu index>]

The output will be a JSON file that contains the feature embeddings of all the test and query data.

Required Arguments#

  • -e, --experiment_spec: The experiment spec file to set up inference

  • results_dir: The path to a folder where the experiment outputs should be written

  • inference.checkpoint: The .tlt model to perform inference with

  • inference.output_file: The path to the output JSON file

  • inference.test_dataset: The path to the test data

  • inference.query_dataset: The path to the query data

Optional Argument#

  • inference.gpu_id: The index of the GPU that will be used to run inference (when the machine has multiple GPUs installed). Note that inference can only run on a single GPU.

Here’s an example of using the ReIdentificationNet Transformer inference command:

tao model re_identification inference -e $DEFAULT_SPEC results_dir=$RESULTS_DIR inference.checkpoint=$TRAINED_TLT_MODEL inference.output_file=$OUTPUT_FILE inference.test_dataset=$TEST_DATA inference.query_dataset=$QUERY_DATA

The expected output is as follows:

[
  {
    "img_path": "/path/to/img1.jpg",
    "embedding": [-0.30, 0.12, 0.13,...]
  },
  {
    "img_path": "/path/to/img2.jpg",
    "embedding": [-0.10, -0.06, -1.85,...]
  },
  ...
  {
    "img_path": "/path/to/imgN.jpg",
    "embedding": [1.41, 0.63, -0.15,...]
  }
]

Exporting the Model#

Use the following command to export ReIdentificationNet Transformer to .onnx format for deployment:

tao model re_identification export -e <experiment_spec>
                             results_dir=<results_dir>
                             export.checkpoint=<tlt checkpoint to be exported>
                             [export.onnx_file=<path to exported file>]
                             [export.gpu_id=<gpu index>]

Required Arguments#

  • -e, --experiment_spec: The experiment spec file to configure export

  • results_dir: The path to a folder where the experiment outputs should be written

  • export.checkpoint: The .tlt model to be exported

Optional Arguments#

  • export.onnx_file: The path to save the exported model to. The default path is in the same directory as the *.tlt model.

  • export.gpu_id: The index of the GPU that will be used to run the export (when the machine has multiple GPUs installed). Note that export can only run on a single GPU.

Here’s an example of using the ReIdentificationNet Transformer export command:

tao model re_identification export -e $DEFAULT_SPEC results_dir=$RESULTS_DIR export.checkpoint=$TRAINED_TLT_MODEL

Deploying the Model#

You can deploy the trained deep-learning and computer-vision models on edge devices–such as a Jetson Xavier, Jetson Nano, or Tesla–or in the cloud with NVIDIA GPUs. The exported *.onnx model can also be used with TAO Triton Apps.