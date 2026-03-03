ReIdentificationNet#
ReIdentificationNet takes cropped images of a person from different perspectives as network input and outputs the embedding features for that person. The embeddings are used to perform similarity matching to re-identify the same person. The model supported in the current version is based on ResNet, which is the most commonly used baseline for re-identification due to its high accuracy.
The expected time to train ReIdentificationNet is as follows:
|
Backbone Type
|
GPU Type
|
No. of training images
|
Image Size
|
No. of identities
|
Batch size
|
Total Epochs
|
Total Training Time
|
Resnet50
|
1 x Nvidia A100 - 80GB PCIE
|
13,000
|
256x128x3
|
751
|
128
|
120
|
~1.25 hours
|
Resnet50
|
1 x Nvidia Quadro GV100 - 32GB
|
13,000
|
256x128x3
|
751
|
64
|
120
|
~2.5 hours
Note
Throughout this documentation are references to
$EXPERIMENT_IDand
$DATASET_IDin the FTMS Client sections.
For instructions on creating a dataset using the remote client, refer to the Creating a dataset section in the Remote Client documentation.
For instructions on creating an experiment using the remote client, refer to the Creating an experiment section in the Remote Client documentation.
-
The spec format is YAML for TAO Launcher, and JSON for FTMS Client.
File-related parameters, such as dataset paths or pretrained model paths, are required only for TAO Launcher, not for FTMS Client.
Data Input for ReIdentificationNet#
The ReIdentificationNet apps in TAO expect data in Market-1501 format for training and evaluation.
See the Data Annotation Format page for more information about the Market-1501 data format.
Creating an Experiment Spec File#
The spec file for ReIdentificationNet includes
model,
dataset,
re_ranking, and
train parameters. Here is an example spec
for training a ResNet model on Market-1501 that contains 751 identities in the training set.
Use the following command to get an experiment spec file for ReIdentificationNet:
BASE_EXPERIMENT_ID=$(tao re_identification list-base-experiments | jq -r '.[0].id')
SPECS=$(tao re_identification get-job-schema --action train --base-experiment-id $BASE_EXPERIMENT_ID | jq -r '.default')
results_dir: "/path/to/experiment_results"
encryption_key: nvidia_tao
model:
backbone: resnet_50
last_stride: 1
pretrain_choice: imagenet
pretrained_model_path: "/path/to/pretrained_model.pth"
input_channels: 3
input_width: 128
input_height: 256
neck: bnneck
feat_dim: 256
neck_feat: after
metric_loss_type: triplet
with_center_loss: False
with_flip_feature: False
label_smooth: True
dataset:
train_dataset_dir: "/path/to/train_dataset_dir"
test_dataset_dir: "/path/to/test_dataset_dir"
query_dataset_dir: "/path/to/query_dataset_dir"
num_classes: 751
batch_size: 64
val_batch_size: 128
num_workers: 1
pixel_mean: [0.485, 0.456, 0.406]
pixel_std: [0.226, 0.226, 0.226]
padding: 10
prob: 0.5
re_prob: 0.5
sampler: softmax_triplet
num_instances: 4
re_ranking:
re_ranking: True
k1: 20
k2: 6
lambda_value: 0.3
train:
results_dir: "${results_dir}/train"
optim:
name: Adam
lr_monitor: val_loss
steps: [40, 70]
gamma: 0.1
bias_lr_factor: 1
weight_decay: 0.0005
weight_decay_bias: 0.0005
warmup_factor: 0.01
warmup_iters: 10
warmup_method: linear
base_lr: 0.00035
momentum: 0.9
center_loss_weight: 0.0005
center_lr: 0.5
triplet_loss_margin: 0.3
num_epochs: 10
checkpoint_interval: 5
validation_interval: 5
seed: 1234
|
Parameter
|
Data Type
|
Default
|
Description
|
Supported Values
|
|
dict config
|
–
|
The configuration of the model architecture
|
|
dict config
|
–
|
The configuration of the dataset
|
|
dict config
|
–
|
The configuration of the training task
|
|
dict config
|
–
|
The configuration of the evaluation task
|
|
dict config
|
–
|
The configuration of the inference task
|
|
string
|
None
|
The encryption key to encrypt and decrypt model files
|
|
string
|
/results
|
The directory where experiment results are saved
|
|
dict config
|
–
|
The configuration of the ONNX export task
|
|
dict config
|
–
|
The configuration for the re-ranking module
model#
The
model parameter provides options to change the ReIdentificationNet architecture.
model:
backbone: resnet_50
last_stride: 1
pretrain_choice: imagenet
pretrained_model_path: "/path/to/pretrained_model.pth"
input_channels: 3
input_width: 128
input_height: 256
neck: bnneck
feat_dim: 256
neck_feat: after
metric_loss_type: triplet
with_center_loss: False
with_flip_feature: False
label_smooth: True
|
Parameter
|
Datatype
|
Default
|
Description
|
Supported Values
|
|
string
|
resnet_50
|
The type of model, which can be resnet_50 or a Swin-based architecture (refer to ReIdentificationNet Transformer for more details)
|
“resnet_50”, “swin_base_patch4_window7_224”, “swin_small_patch4_window7_224, “swin_tiny_patch4_window7_224”
|
|
unsigned int
|
1
|
The number of strides during convolution
|
>0
|
|
string
|
imagenet
|
The pre-trained network
|
imagenet/self/””
|
|
string
|
The path to the pre-trained model
|
|
unsigned int
|
3
|
The number of input channels
|
>0
|
|
int
|
128
|
The width of the input images
|
>0
|
|
int
|
256
|
The height of the input images
|
>0
|
|
string
|
bnneck
|
Specifies whether to train with BNNeck
|
bnneck/””
|
|
unsigned int
|
256
|
The output size of the feature embeddings
|
>0
|
|
string
|
after
|
Specifies which feature of BNNeck to use for testing
|
before/after
|
|
string
|
triplet
|
The type of metric loss
|
triplet/center/triplet_center
|
|
bool
|
False
|
Specifies whether to enable center loss
|
True/False
|
|
bool
|
False
|
Specifies whether to enable image flipping
|
True/False
|
|
bool
|
True
|
Specifies whether to enable label smoothing
|
True/False
dataset#
The
dataset parameter defines the dataset source, training batch size, and augmentation.
dataset:
train_dataset_dir: "/path/to/train_dataset_dir"
test_dataset_dir: "/path/to/test_dataset_dir"
query_dataset_dir: "/path/to/query_dataset_dir"
num_classes: 751
batch_size: 64
val_batch_size: 128
num_workers: 1
pixel_mean: [0.485, 0.456, 0.406]
pixel_std: [0.226, 0.226, 0.226]
padding: 10
prob: 0.5
re_prob: 0.5
sampler: softmax_triplet
num_instances: 4
|
Parameter
|
Datatype
|
Default
|
Description
|
Supported Values
|
|
string
|
The path to the train images
|
|
string
|
The path to the test images
|
|
string
|
The path to the query images
|
|
unsigned int
|
751
|
The number of unique person IDs
|
>0
|
|
unsigned int
|
64
|
The batch size for training
|
>0
|
|
unsigned int
|
128
|
The batch size for validation
|
>0
|
|
unsigned int
|
1
|
The number of parallel workers processing data
|
>0
|
|
float list
|
[0.485, 0.456, 0.406]
|
The pixel mean for image normalization
|
float list
|
|
float list
|
[0.226, 0.226, 0.226]
|
The pixel standard deviation for image normalization
|
float list
|
|
unsigned int
|
10
|
The pixel padding size around images for image augmentation
|
>=1
|
|
float
|
0.5
|
The random horizontal flipping probability for image augmentation
|
>0
|
|
float
|
0.5
|
The random erasing probability for image augmentation
|
>0
|
|
string
|
softmax_triplet
|
The type of sampler for data loading
|
softmax/triplet/softmax_triplet
|
|
unsigned int
|
4
|
The number of image instances of the same person in a batch
|
>0
re_ranking#
The
re_ranking parameter defines the settings for the re-ranking module.
re_ranking:
re_ranking: True
k1: 20
k2: 6
lambda_value: 0.3
|
Parameter
|
Datatype
|
Default
|
Description
|
Supported Values
|
|
bool
|
True
|
A flag that enables the re-ranking module
|
True/False
|
|
unsigned int
|
20
|
The k used for k-reciprocal nearest neighbors
|
>0
|
|
unsigned int
|
6
|
The k used for local query expansion
|
>0
|
|
float
|
0.3
|
The weight of original distance in the combination with Jaccard distance
|
>0.0
train#
The
train parameter defines the hyperparameters of the training process.
train:
optim:
name: Adam
lr_monitor: val_loss
steps: [40, 70]
gamma: 0.1
bias_lr_factor: 1
weight_decay: 0.0005
weight_decay_bias: 0.0005
warmup_factor: 0.01
warmup_iters: 10
warmup_method: linear
base_lr: 0.00035
momentum: 0.9
center_loss_weight: 0.0005
center_lr: 0.5
triplet_loss_margin: 0.3
num_epochs: 10
checkpoint_interval: 5
validation_interval: 5
seed: 1234
|
Parameter
|
Datatype
|
Default
|
Description
|
Supported Values
|
|
unsigned int
|
1
|
The number of GPUs to use for distributed training
|
>0
|
|
List[int]
|
[0]
|
The indices of the GPU’s to use for distributed training
|
|
unsigned int
|
1234
|
The random seed for random, NumPy, and torch
|
>0
|
|
unsigned int
|
10
|
The total number of epochs to run the experiment
|
>0
|
|
unsigned int
|
1
|
The epoch interval at which the checkpoints are saved
|
>0
|
|
unsigned int
|
1
|
The epoch interval at which the validation is run
|
>0
|
|
string
|
The intermediate PyTorch Lightning checkpoint to resume training from
|
|
string
|
/results/train
|
The directory to save training results
|
|
dict config
|
The configuration for the SGD optimizer, including the learning rate, learning scheduler, weight decay, etc.
|
|
float
|
0.0
|
The amount to clip the gradient by the L2 norm. A value of 0.0 specifies no clipping.
|
>=0
optim#
The
optim parameter defines the config for the SGD optimizer in training, including the
learning rate, learning scheduler, and weight decay.
optim:
name: Adam
lr_monitor: val_loss
lr_steps: [40, 70]
gamma: 0.1
bias_lr_factor: 1
weight_decay: 0.0005
weight_decay_bias: 0.0005
warmup_factor: 0.01
warmup_iters: 10
warmup_method: linear
base_lr: 0.00035
momentum: 0.9
center_loss_weight: 0.0005
center_lr: 0.5
triplet_loss_margin: 0.3
|
Parameter
|
Datatype
|
Default
|
Description
|
Supported Values
|
|
string
|
Adam
|
The name of the optimizer
|
Adam/SGD/Adamax/…
|
|
string
|
val_loss
|
The monitor value for the AutoReduce scheduler
|
val_loss/train_loss
|
|
int list
|
[40, 70]
|
The steps to decrease the learning rate for the
|
int list
|
|
float
|
0.1
|
The decay rate for the WarmupMultiStepLR
|
>0.0
|
|
float
|
1
|
The bias learning rate factor for the WarmupMultiStepLR
|
>=1
|
|
float
|
0.0005
|
The weight decay coefficient for the optimizer
|
>0.0
|
|
float
|
0.0005
|
The weight decay bias for the optimizer
|
>0.0
|
|
float
|
0.01
|
The warmup factor for the WarmupMultiStepLR scheduler
|
>0.0
|
|
unsigned int
|
10
|
The number of warmup iterations for the WarmupMultiStepLR scheduler
|
>0
|
|
string
|
linear
|
The warmup method for the optimizer
|
linear/cosine
|
|
float
|
0.00035
|
The initial learning rate for the training
|
>0.0
|
|
float
|
0.9
|
The momentum for the WarmupMultiStepLR optimizer
|
>0.0
|
|
float
|
0.0005
|
The balanced weight of center loss
|
>0.0
|
|
float
|
0.5
|
The learning rate of SGD to learn the centers of center loss
|
>0.0
|
|
float
|
0.3
|
The margin value for triplet loss
|
>0.0
Training the Model#
Use the following command to run ReIdentificationNet training:
TRAIN_JOB_ID=$(tao re_identification create-job \
--kind experiment \
--name "re_identification_train" \
--action train \
--workspace-id $WORKSPACE_ID \
--specs "$TRAIN_SPECS" \
--train-datasets '["'$DATASET_ID'"]' \
--eval-dataset "$DATASET_ID" \
--base-experiment-ids '["'$BASE_EXPERIMENT_ID'"]' \
--encryption-key "nvidia_tlt" | jq -r '.id')
tao model re_identification train [-h] -e <experiment_spec>
[results_dir=<global_results_dir>]
[model.<model_option>=<model_option_value>]
[dataset.<dataset_option>=<dataset_option_value>]
[train.<train_option>=<train_option_value>]
[train.gpu_ids=<gpu indices>]
[train.num_gpus=<number of gpus>]
Required Arguments
The following arguments are required.
-e, --experiment_spec_file: The path to the experiment spec file.
Optional Arguments
You can set optional arguments to override the option values in the experiment spec file.
-h, --help: Show this help message and exit.
model.<model_option>: The model options.
dataset.<dataset_option>: The dataset options.
re_ranking.<rerank_option>: The re-ranking options.
train.<train_option>: The train options.
train.optim.<optim_option>: The optimizer options
Note
For training, evaluation, and inference, we expose two variables for each task:
num_gpus and
gpu_ids, which
default to
1 and
[0], respectively. If both are passed, but are inconsistent, for example
num_gpus = 1,
gpu_ids = [0, 1], then they are modified to follow the setting that implies more GPUs; in the same example
num_gpus is modified from 1 to 2.
In some cases multi-GPU training may result in a segmentation fault. You can circumvent this by
setting the enviroment variable
OMP_NUM_THREADS to 1. Depending upon your model of execution, you may use the following methods to set
this variable:
CLI Launcher:
You may set the environment variable by adding the following fields to the
Envsfield of your
~/.tao_mounts.jsonfile as mentioned in bullet 3 in ths section Running the launcher.
{ "Envs": [ { "variable": "OMP_NUM_THREADSR", "value": "1" } }
Docker:
You may set environment variables in Docker by setting the
-eflag in the Docker command line.
docker run -it --rm --gpus all \ -e OMP_NUM_THREADS=1 \ -v /path/to/local/mount:/path/to/docker/mount nvcr.io/nvidia/tao/tao-toolkit:5.5.0-pyt <model> train -e
Checkpointing and Resuming Training
At every
train.checkpoint_interval, a PyTorch Lightning checkpoint is saved. It is called
model_epoch_<epoch_num>.pth.
Checkpoints are saved in
train.results_dir, like this:
$ ls /results/train
'model_epoch_000.pth'
'model_epoch_001.pth'
'model_epoch_002.pth'
'model_epoch_003.pth'
'model_epoch_004.pth'
The latest checkpoint is saved as
reid_model_latest.pth.
Training automatically resumes from
reid_model_latest.pth, if it exists in
train.results_dir.
This is superseded by
train.resume_training_checkpoint_path, if it is provided.
The major implication of this logic is that, if you wish to trigger fresh training from scratch, either:
Specify a new, empty results directory (Recommended)
Remove the latest checkpoint from the results directory
Evaluating the Model#
The evaluation metric of ReIdentificationNet is the mean average precision and ranked accuracy.
The plots of sampled matches and the cumulative matching characteristic (CMC) curve can be obtained using
the
evaluate.output_sampled_matches_plot and
evaluate.output_cmc_curve_plot parameters,
respectively.
Use the following command to run ReIdentificationNet evaluation:
TRAIN_JOB_ID=$(tao re_identification create-job \
--kind experiment \
--name "re_identification_evaluate" \
--action evaluate \
--workspace-id $WORKSPACE_ID \
--parent-job-id $TRAIN_JOB_ID \
--eval-dataset "$DATASET_ID" \
--specs "$EVALUATE_SPECS" \
--base-experiment-ids '["'$BASE_EXPERIMENT_ID'"]' \
--encryption-key "nvidia_tlt" | jq -r '.id')
tao model re_identification evaluate [-h] -e <experiment_spec_file>
evaluate.checkpoint=<model to be evaluated>
evaluate.output_sampled_matches_plot=<path to the output sampled matches plot>
evaluate.output_cmc_curve_plot=<path to the output CMC curve plot>
evaluate.test_dataset=<path to test data>
evaluate.query_dataset=<path to query data>
[evaluate.<evaluate_option>=<evaluate_option_value>]
[evaluate.gpu_ids=<gpu indices>]
[evaluate.num_gpus=<number of gpus>]
Required Arguments
The following arguments are required.
-e, --experiment_spec_file: The experiment spec file to set up the evaluation experiment
evaluate.checkpoint: The
.pthmodel
evaluate.output_sampled_matches_plot: The path to the plotted file of sampled matches
evaluate.output_cmc_curve_plot: The path to the plotted file of the CMC curve
evaluate.test_dataset: The path to the test data
evaluate.query_dataset: The path to the query data
Optional Arguments
evaluate.gpu_ids: The GPU indices to run evaluation. Defaults to
[0].
evaluate.num_gpus: The number of GPUs to run evaluation. Defualts to
1.
evaluate.results_dir: The directory to save the evaluation results. Defaults to
/results/evaluate.
Multi-GPU evaluation is not supported for Re-Identification.
Running Inference on the Model#
Use the following command to run inference on ReIdentificationNet with the
.tlt model.
TRAIN_JOB_ID=$(tao re_identification create-job \
--kind experiment \
--name "re_identification_inference" \
--action inference \
--workspace-id $WORKSPACE_ID \
--parent-job-id $TRAIN_JOB_ID \
--inference-dataset "$DATASET_ID" \
--specs "$INFERENCE_SPECS" \
--base-experiment-ids '["'$BASE_EXPERIMENT_ID'"]' \
--encryption-key "nvidia_tlt" | jq -r '.id')
tao model re_identification inference [-h] -e <experiment_spec>
inference.checkpoint=<inference model>
inference.output_file=<path to output file>
inference.test_dataset=<path to gallery data>
inference.query_dataset=<path to query data>
[inference.<infer_option>=<infer_option_value>]
[inference.gpu_ids=<gpu indices>]
[inference.num_gpus=<number of gpus>]
Required Arguments
The following arguments are required.
-e, --experiment_spec: The experiment spec file to set up inference
inference.checkpoint: The
.pthmodel to perform inference with
inference.output_file: The path to the output JSON file
inference.test_dataset: The path to the test data
inference.query_dataset: The path to the query data
Optional Arguments
inference.gpu_ids: The GPU indices to run inference. Defaults to
[0].
inference.num_gpus: The number of GPUs to run inference. Defualts to
1.
inference.results_dir: The directory to save the inference results. Defaults to
/results/inference.
The output is a JSON file that contains the feature embeddings of all the test and query data.
Multi-GPU inference is currently not supported for Re-Identification.
The expected output would be as follows:
[
{
"img_path": "/path/to/img1.jpg",
"embedding": [-0.30, 0.12, 0.13,...]
},
{
"img_path": "/path/to/img2.jpg",
"embedding": [-0.10, -0.06, -1.85,...]
},
...
{
"img_path": "/path/to/imgN.jpg",
"embedding": [1.41, 0.63, -0.15,...]
}
]
Exporting the Model#
Use the following command to export ReIdentificationNet to
.onnx format for deployment:
TRAIN_JOB_ID=$(tao re_identification create-job \
--kind experiment \
--name "re_identification_export" \
--action export \
--workspace-id $WORKSPACE_ID \
--parent-job-id $TRAIN_JOB_ID \
--specs "$EXPORT_SPECS" \
--base-experiment-ids '["'$BASE_EXPERIMENT_ID'"]' \
--encryption-key "nvidia_tlt" | jq -r '.id')
tao model re_identification export -e <experiment_spec>
export.checkpoint=<tlt checkpoint to be exported>
export.onnx_file=<path to exported file>
[export.gpu_id=<gpu index>]
Required Arguments
The following arguments are required.
-e, --experiment_spec: The experiment spec file to set up export.
export.checkpoint: The
.pthmodel to be exported.
export.onnx_file: The path to save the exported model to. The default path is in the same directory as the
\*.pthmodel.
Optional Arguments
The following arguments are optional to run the command.
export.gpu_id: The index of the GPU that will be used to run the export. You can specify this value when the machine has multiple GPUs installed. Note that export can only run on a single GPU.
Deploying the Model#
You can deploy the trained deep -earning and computer-vision models on edge devices–such as a Jetson Xavier,
Jetson Nano, or Tesla–or in the cloud with NVIDIA GPUs. The exported
*.onnx model can also be used with TAO Triton Apps.
Running ReIdentificationNet Inference on the Triton Sample#
The TAO Triton Apps provide an inference sample for ReIdentificationNet. It consumes a TensorRT engine and supports running with a directory of query (probe) images and a directory of test (gallery) images containing the same identities.
To use this sample, you need to generate the TensorRT engine from an
*.onnx model using
trtexec.
Generating TensorRT Engine Using
trtexec#
For instructions on generating a TensorRT engine using the
trtexec command, refer to the
trtexec guide for ReIdentificationNet.
Running the Triton Inference Sample#
You can generate the TensorRT engine when starting the Triton server using the following command:
bash scripts/start_server.sh
When the server is running, you can get results from a directory of query images and a directory of test images using the following command with a client:
python tao_client.py <path_to_query_directory> \
--test_dir <path_to_test_directory>
-m re_identification_tao model \
-x 1 \
-b 16 \
--mode Re_identification \
-i https \
-u localhost:8000 \
--async \
--output_path <path_to_output_directory>
Note
The server will perform inference on the input image directories. The results are saved as a JSON file. The following is a sample of the JSON output:
[
...,
{
"img_path": "/localhome/Data/market1501/query/1121_c3s2_156744_00.jpg",
"embedding": [-1.1530249118804932, -1.8521332740783691,..., 0.380886435508728]
},...
{
"img_path": "/localhome/Data/market1501/bounding_box_test/1377_c2s3_038007_05.jpg",
"embedding": [0.09496910870075226, 0.26107653975486755,..., 0.2835155725479126]
},...
]
End-to-End Inference Using Triton#
The TAO Triton Apps provides a sample for end-to-end inference from a directory of query images and a directory of test images. The sample downloads the Market-1501 dataset and randomly samples a subset of 100 identities. The client implicitly converts the image samples into arrays and sends them to the Triton server. The feature embedding for each image is returned and saved to the JSON output. An image of sampled matches and a figure of the CMC curve is also generated for visualization.
You can start the Triton server using the following command (only the ReIdentificationNet model will be downloaded and converted into a TensorRT engine):
bash scripts/re_id_e2e_inference/start_server.sh
Once the Triton server has started, open another terminal and use the following command to run re-identification on the query and test images using the Triton server instance that you have previously spun up:
bash scripts/re_id_e2e_inference/start_client.sh