ReIdentificationNet Transformer

ReIdentificationNet Transformer receives cropped images of a person from different perspectives as network input and outputs the embedding features for that person. The embeddings are used to perform similarity matching to re-identify the same person. The model is based on Swin Transformer, which is a general-purpose backbone for computer vision.

Data Input for ReIdentificationNet Transformer

The ReIdentificationNet Transformer apps in TAO Toolkit expect data in Market-1501 format for training and evaluation.

Refer to the Data Annotation Format page for more information about the Market-1501 data format.

Creating an Experiment Spec File

The spec file for ReIdentificationNet Transformer includes model, dataset, re_ranking, and train parameters. The following is an example spec for training a Swin Tiny model on Market-1501 with 751 identities in the training set.

Copy
Copied!

            
            results_dir: "/path/to/experiment_results"
encryption_key: nvidia_tao
model:
  backbone: swin_tiny_patch4_window7_224
  last_stride: 1
  pretrain_choice: self
  pretrained_model_path: "/path/to/pretrained_model.pth"
  input_channels: 3
  input_width: 128
  input_height: 384
  neck: bnneck
  stride_size: [16, 16]
  feat_dim: 1024
  no_margin: True
  neck_feat: after
  metric_loss_type: triplet
  with_center_loss: False
  with_flip_feature: False
  label_smooth: False
  pretrain_hw_ratio: 2
dataset:
  train_dataset_dir: "/path/to/train_dataset_dir"
  test_dataset_dir: "/path/to/test_dataset_dir"
  query_dataset_dir: "/path/to/query_dataset_dir"
  num_classes: 751
  batch_size: 64
  val_batch_size: 128
  num_workers: 8
  pixel_mean: [0.5, 0.5, 0.5]
  pixel_std: [0.5, 0.5, 0.5]
  padding: 10
  prob: 0.5
  re_prob: 0.5
  sampler: softmax_triplet
  num_instances: 4
re_ranking:
  re_ranking: True
  k1: 20
  k2: 6
  lambda_value: 0.3
train:
  results_dir: "${results_dir}/train"
  optim:
    name: SGD
    lr_steps: [40, 70]
    gamma: 0.1
    bias_lr_factor: 2
    weight_decay: 0.0001
    weight_decay_bias: 0.0001
    warmup_factor: 0.01
    warmup_epochs: 20
    warmup_method: cosine
    base_lr: 0.0008
    momentum: 0.9
    center_loss_weight: 0.0005
    center_lr: 0.5
    triplet_loss_margin: 0.3
    large_fc_lr: False
  num_epochs: 120
  checkpoint_interval: 10

Parameter	Data Type	Default	Description
`model`	dict config	–	The configuration for the model architecture
`train`	dict config	–	The configuration for the training process
`dataset`	dict config	–	The configuration for the dataset
`re_ranking`	dict config	–	The configuration for the re-ranking module

model

The model parameter provides options to change the ReIdentificationNet Transformer architecture.

Copy
Copied!

            
            model:
  backbone: swin_tiny_patch4_window7_224
  last_stride: 1
  pretrain_choice: self
  pretrained_model_path: "/path/to/pretrained_model.pth"
  input_channels: 3
  input_width: 128
  input_height: 384
  neck: bnneck
  stride_size: [16, 16]
  feat_dim: 1024
  no_margin: True
  neck_feat: after
  metric_loss_type: triplet
  with_center_loss: False
  with_flip_feature: False
  label_smooth: False
  pretrain_hw_ratio: 2

Parameter	Datatype	Default	Description	Supported Values
`backbone`	string	swin_tiny_patch4_window7_224	The type of model, which can be Swin-based architectures or resnet_50 (please refer to ReIdentificationNet)	resnet_50/swin_base_patch4_window7_224/swin_small_patch4_window7_224/swin_tiny_patch4_window7_224
`last_stride`	unsigned int	1	The number of strides during convolution	>0
`pretrain_choice`	string	self	Specifies the pre-trained network	self/imagenet/””
`pretrained_model_path`	string		The path to the pre-trained model
`input_channels`	unsigned int	3	The number of input channels	>0
`input_width`	int	128	The width of the input images	>0
`input_height`	int	384	The height of the input images	>0
`neck`	string	bnneck	Specifies whether to train with BNNeck	bnneck/””
`feat_dim`	unsigned int	1024	The output size of the feature embeddings	>0
`no_margin`	bool	True	A flag specifying whether to train with soft triplet loss	True/False
`neck_feat`	string	after	Specifies which feature of BNNeck to use for testing	before/after
`metric_loss_type`	string	triplet	The type of metric loss	triplet/center/triplet_center
`with_center_loss`	bool	False	A flag specifying whether to enable center loss	True/False
`with_flip_feature`	bool	False	A flag specifying whether to enable image flipping	True/False
`label_smooth`	bool	False	A flag specifying whether to enable label smoothing	True/False
`pretrain_hw_ratio`	float	2	The height-width ratio of the pre-trained model	>0

dataset

The dataset parameter defines the dataset source, training batch size, and augmentation.

Copy
Copied!

            
            dataset:
  train_dataset_dir: "/path/to/train_dataset_dir"
  test_dataset_dir: "/path/to/test_dataset_dir"
  query_dataset_dir: "/path/to/query_dataset_dir"
  num_classes: 751
  batch_size: 64
  val_batch_size: 128
  num_workers: 8
  pixel_mean: [0.5, 0.5, 0.5]
  pixel_std: [0.5, 0.5, 0.5]
  padding: 10
  prob: 0.5
  re_prob: 0.5
  sampler: softmax_triplet
  num_instances: 4

Parameter	Datatype	Default	Description	Supported Values
`train_dataset_dir`	string		The path to the train images
`test_dataset_dir`	string		The path to the test images
`query_dataset_dir`	string		The path to the query images
`num_classes`	unsigned int	751	The number of unique person IDs	>0
`batch_size`	unsigned int	64	The batch size for training	>0
`val_batch_size`	unsigned int	128	The batch size for validation	>0
`num_workers`	unsigned int	8	The number of parallel workers processing data	>0
`pixel_mean`	float list	[0.5, 0.5, 0.5]	The pixel mean for image normalization	float list
`pixel_std`	float list	[0.5, 0.5, 0.5]	The pixel standard deviation for image normalization	float list
`padding`	unsigned int	10	The pixel padding size around images for image augmentation	>=1
`prob`	float	0.5	The random horizontal flipping probability for image augmentation	>0
`re_prob`	float	0.5	The random erasing probability for image augmentation	>0
`sampler`	string	softmax_triplet	The type of sampler for data loading	softmax/triplet/softmax_triplet
`num_instances`	unsigned int	4	The number of image instances of the same person in a batch	>0

re_ranking

The re_ranking parameter defines the settings for the re-ranking module.

Copy
Copied!

            
            re_ranking:
  re_ranking: True
  k1: 20
  k2: 6
  lambda_value: 0.3

Parameter	Datatype	Default	Description	Supported Values
`re_ranking`	bool	True	A flag that enables the re-ranking module	True/False
`k1`	unsigned int	20	The k used for k-reciprocal nearest neighbors	>0
`k2`	unsigned int	6	The k used for local query expansion	>0
`lambda_value`	float	0.3	The weight of the original distance in combination with the Jaccard distance	>0.0

train

The train parameter defines the hyperparameters of the training process.

Copy
Copied!

            
            train:
  optim:
    name: SGD
    lr_steps: [40, 70]
    gamma: 0.1
    bias_lr_factor: 2
    weight_decay: 0.0001
    weight_decay_bias: 0.0001
    warmup_factor: 0.01
    warmup_epochs: 20
    warmup_method: cosine
    base_lr: 0.0008
    momentum: 0.9
    center_loss_weight: 0.0005
    center_lr: 0.5
    triplet_loss_margin: 0.3
    large_fc_lr: False
  num_epochs: 120
  checkpoint_interval: 10

Parameter	Datatype	Default	Description	Supported Values
`optim`	dict config		The configuration for the SGD optimizer, including the learning rate, learning scheduler, weight decay, etc.
`num_epochs`	unsigned int	120	The total number of epochs to run the experiment	>0
`checkpoint_interval`	unsigned int	10	The interval at which the checkpoints are saved	>0
`clip_grad_norm`	float	0.0	The amount to clip the gradient by the L2 norm. A value of 0.0 specifies no clipping.	>=0

optim

The optim parameter defines the config for the SGD optimizer in training, including the learning rate, learning scheduler, and weight decay.

Copy
Copied!

            
            optim:
  name: SGD
  lr_steps: [40, 70]
  gamma: 0.1
  bias_lr_factor: 2
  weight_decay: 0.0001
  weight_decay_bias: 0.0001
  warmup_factor: 0.01
  warmup_epochs: 20
  warmup_method: cosine
  base_lr: 0.0008
  momentum: 0.9
  center_loss_weight: 0.0005
  center_lr: 0.5
  triplet_loss_margin: 0.3
  large_fc_lr: False

Parameter	Datatype	Default	Description	Supported Values
`name`	string	SGD	The name of the optimizer	Adam/SGD/Adamax/…
`lr_steps`	int list	[40, 70]	The steps to decrease the learning rate for the `MultiStep` scheduler	int list
`gamma`	float	0.1	The decay rate for WarmupMultiStepLR	>0.0
`bias_lr_factor`	float	2	The bias learning rate factor for WarmupMultiStepLR	>=1
`weight_decay`	float	0.0001	The weight decay coefficient for the optimizer	>0.0
`weight_decay_bias`	float	0.0001	The weight decay bias for the optimizer	>0.0
`warmup_factor`	float	0.01	The warmup factor for the WarmupMultiStepLR scheduler	>0.0
`warmup_epochs`	unsigned int	20	The number of warmup epochs for the WarmupMultiStepLR scheduler	>0
`warmup_method`	string	cosine	The warmup method for the optimizer	cosine/linear
`base_lr`	float	0.0008	The initial learning rate for the training	>0.0
`momentum`	float	0.9	The momentum for the WarmupMultiStepLR optimizer	>0.0
`center_loss_weight`	float	0.0005	The balanced weight for center loss	>0.0
`center_lr`	float	0.5	The learning rate for SGD to learn the centers of center loss	>0.0
`triplet_loss_margin`	float	0.3	The margin value for triplet loss	>0.0
`large_fc_lr`	bool	False	A flag specifying whether to enable large fully connected learning rate	True/False

Training the Model

Use the following command to run ReIdentificationNet Transformer training:

Copy
Copied!

            
            tao model re_identification train -e <experiment_spec_file>
                            -r <results_dir>
                            -k <key>
                            [train.gpu_ids=<gpu id list>]

Required Arguments

-e, --experiment_spec_file: The path to the experiment spec file
-r, --results_dir: The path to a folder where the experiment outputs should be written
-k, --key: The user-specific encoding key to save or load a .tlt model

Optional Arguments

train.gpu_ids: A list of GPU indices to use for training. If you set more than one GPU ID, multi-GPU training will be triggered automatically.

Here’s an example of using the ReIdentificationNet Transformer training command:

Copy
Copied!

            
            tao model re_identification train -e $DEFAULT_SPEC -r $RESULTS_DIR -k $KEY

Evaluating the Model

The evaluation metrics for ReIdentificationNet Transformer are the mean average precision and ranked accuracy. The plots of sampled matches and the cumulative matching characteristic (CMC) curve can be obtained using the evaluate.output_sampled_matches_plot and evaluate.output_cmc_curve_plot parameters, respectively.

Use the following command to run ReIdentificationNet Transformer evaluation:

Copy
Copied!

            
            tao model re_identification evaluate -e <experiment_spec_file>
                               -r <results_dir>
                               -k <key>
                               evaluate.checkpoint=<model to be evaluated>
                               evaluate.output_sampled_matches_plot=<path to the output sampled matches plot>
                               evaluate.output_cmc_curve_plot=<path to the output CMC curve plot>
                               evaluate.test_dataset=<path to test data>
                               evaluate.query_dataset=<path to query data>
                               [evaluate.gpu_id=<gpu index>]

Required Arguments

-e, --experiment_spec_file: The experiment spec file to set up the evaluation experiment
-r, --results_dir: The path to a folder where the experiment outputs should be written
-k, --key: The encoding key for the .tlt model
evaluate.checkpoint: The .tlt model
evaluate.output_sampled_matches_plot: The path to the plotted file of sampled matches
evaluate.output_cmc_curve_plot: The path to the plotted file of the CMC curve
evaluate.test_dataset: The path to the test data
evaluate.query_dataset: The path to the query data

Optional Argument

evaluate.gpu_id: The GPU index used to run evaluation (when the machine has multiple GPUs installed). Note that evaluation can only run on a single GPU.

Here’s an example of using the ReIdentificationNet Transformer evaluation command:

Copy
Copied!

            
            tao model re_identification evaluate -e $DEFAULT_SPEC -r $RESULTS_DIR -k $KEY evaluate.checkpoint=$TRAINED_TLT_MODEL evaluate.output_sampled_matches_plot=$OUTPUT_SAMPLED_MATCHED_PLOT evaluate.output_cmc_curve_plot=$OUTPUT_CMC_CURVE_PLOT evaluate.test_dataset=$TEST_DATA evaluate.query_dataset=$QUERY_DATA

Running Inference on the Model

Use the following command to run inference on ReIdentificationNet Transformer with the .tlt model.

Copy
Copied!

            
            tao model re_identification inference -e <experiment_spec>
                                -r <results_dir>
                                -k <key>
                                inference.checkpoint=<inference model>
                                inference.output_file=<path to output file>
                                inference.test_dataset=<path to gallery data>
                                inference.query_dataset=<path to query data>
                                [inference.gpu_id=<gpu index>]

The output will be a JSON file that contains the feature embeddings of all the test and query data.

Required Arguments

-e, --experiment_spec: The experiment spec file to set up inference
-r, --results_dir: The path to a folder where the experiment outputs should be written
-k, --key: The encoding key for the .tlt model
inference.checkpoint: The .tlt model to perform inference with
inference.output_file: The path to the output JSON file
inference.test_dataset: The path to the test data
inference.query_dataset: The path to the query data

Optional Argument

inference.gpu_id: The index of the GPU that will be used to run inference (when the machine has multiple GPUs installed). Note that inference can only run on a single GPU.

Here’s an example of using the ReIdentificationNet Transformer inference command:

Copy
Copied!

            
            tao model re_identification inference -e $DEFAULT_SPEC -r $RESULTS_DIR -k $KEY inference.checkpoint=$TRAINED_TLT_MODEL inference.output_file=$OUTPUT_FILE inference.test_dataset=$TEST_DATA inference.query_dataset=$QUERY_DATA

The expected output is as follows:

Copy
Copied!

            
            [
  {
    "img_path": "/path/to/img1.jpg",
    "embedding": [-0.30, 0.12, 0.13,...]
  },
  {
    "img_path": "/path/to/img2.jpg",
    "embedding": [-0.10, -0.06, -1.85,...]
  },
  ...
  {
    "img_path": "/path/to/imgN.jpg",
    "embedding": [1.41, 0.63, -0.15,...]
  }
]

Exporting the Model

Use the following command to export ReIdentificationNet Transformer to .onnx format for deployment:

Copy
Copied!

            
            tao model re_identification export -e <experiment_spec>
                             -r <results_dir>
                             -k <key>
                             export.checkpoint=<tlt checkpoint to be exported>
                             [export.onnx_file=<path to exported file>]
                             [export.gpu_id=<gpu index>]

Required Arguments

-e, --experiment_spec: The experiment spec file to configure export
-r, --results_dir: The path to a folder where the experiment outputs should be written
-k, --key: The encoding key for the .tlt model
export.checkpoint: The .tlt model to be exported

Optional Arguments

export.onnx_file: The path to save the exported model to. The default path is in the same directory as the \*.tlt model.
export.gpu_id: The index of the GPU that will be used to run the export (when the machine has multiple GPUs installed). Note that export can only run on a single GPU.

Here’s an example of using the ReIdentificationNet Transformer export command:

Copy
Copied!

            
            tao model re_identification export -e $DEFAULT_SPEC -r $RESULTS_DIR -k $KEY export.checkpoint=$TRAINED_TLT_MODEL

Deploying the Model

You can deploy the trained deep-learning and computer-vision models on edge devices–such as a Jetson Xavier, Jetson Nano, or Tesla–or in the cloud with NVIDIA GPUs. The exported \*.onnx model can also be used with TAO Toolkit Triton Apps.