MTMC & ReID Annotation Guide

Overview

Ground truth (GT) data is crucial for evaluating the accuracy of the reference application and providing a benchmark that highlights areas requiring improvement.

GT generation is particularly useful for the following tasks:

MTMC accuracy evaluation on your own videos.
Automatic accuracy tuning via PipeTuner. See the Automatic Accuracy Tuning guide.
ReID Finetuning to improve MTMC & RTLS accuracy.

In this guide, we outline the methods and best practices for generating annotated data tailored for the MTMC workflow and ReID fine-tuning using your real scene videos.

Given that GT generation for real cameras is a time-consuming process, we strongly recommend conducting initial tests Omniverse/Isaac Sim synthetic environments. This approach allows for the refinement of methods and settings in a controlled, cost-effective manner before applying them to real-world camera setups. Ground truth for synthetic environments is relatively easy to obtain, which facilitates quicker iterations and optimizations. You can read more about this here.

If you are planning to generate annotated data for real cameras, please continue reading the subsequent steps below for detailed guidance on the process.

Annotating Data for Multi-Target Multi-Camera (MTMC)

The MTMC ground truth file needs to be generated in the multi-object tracking (MOT) format. <camera_id> <object_id> <frame_id> <bbox_left> <bbox_top> <bbox_width> <bbox_height> <x> <y> You may look for GitHub repositories of MOT annotating tools to accelerate the labeling process, such as MOT_Annot. These tools can help you generate the MTMC ground truth file, which can be used to evaluate your MTMC pipeline & optimize it via PipeTuner.

Annotating Data for ReID Model Fine-Tuning

ReID Model Fine-Tuning is performed via supervised learning. To perform this, we need ground truth in the Market-1501 format, where each image crop represents a unique object with a unique ID. We highlight the below ReID annotation methods for fine-tuning ReID models, which are especially useful when model accuracy is a concern. Here are some suggested steps to generate a ReID GT file for your real video scenes.

Methods

Open Source Tools: You may look into open source GitHub repositories like MOT_Annot to generate ReID samples for TAO Re-Identification finetuning.
Semi-Automatic Labeling via Behavior-Based Sampling: Implement behavior-based sampling to create ReID crops. This involves several additional steps and the selection and labeling of behaviors as detailed below.
1. Log mdx-raw Messages: Set up a Kafka consumer to log mdx-raw messages from the DeepStream perception pipeline, as mentioned here.
2. Run MTMC Batch Processing: Execute MTMC batch processing using the default app configuration with enableDebug set to true. Note that only intermediate results such as frames.json and behaviors.json are required. Use the following command: python3 -m main_batch_processing --config resources/app_config.json --calibration resources/calibration.json
3. Visualize Frames and Behaviors: Use the MTMC visualization tool to plot frames and/or behaviors. Set vizMode to either frames or behaviors and vizFilteredFrames to false to visualize all objects.
4. Select Behaviors: Manually select 10 or more behaviors for each global ID from at least four diverse viewing perspectives.
5. Create and Process the Labeling Text File: Construct a text file to assign behaviors to global IDs in the format: <behavior_id_a> #-# <frame_id_start> #-# <frame_id_end> <global_id_A>. Use the python script generate_reid_data_from_selected_behaviors.py in the synthetic data utils NGC resource to process this labeling text file, combining it with the logged mdx-raw messages and corresponding video files, to sample image crops in the Market-1501 format.
6. Fine-Tune ReID Model: Follow the provided instructions here using the TAO toolkit to fine-tune ReID models based on the processed samples from the earlier steps.