Gst-nvtracker
==============

This plugin allows the DS pipeline to use a low-level tracker library to track the detected objects with persistent (possibly unique) IDs over time. It supports any low-level library that implements ``NvDsTracker`` API, including the the reference implementations provided by the `NvMultiObjectTracker` library: IOU, NvSORT, NvDeepSORT and NvDCF trackers. As part of this API, the plugin queries the low-level library for capabilities and requirements concerning the input format, memory type, and batch processing support. Based on these queries, the plugin then converts the input frame buffers into the format requested by the low-level tracker library. For example, the NvDeepSORT and NvDCF trackers use NV12 or RGBA, while IOU and NvSORT requires no video frame buffers at all.

The capabilities of a low-level tracker library also include support for `batch processing` across multiple input streams. Batch processing is typically more efficient than processing each stream independently, especially when the GPU-based acceleration is performed by the low-level library. If a low-level library supports batch processing, it would be the mode of operation selected by the plugin; however, this preference can be overridden with ``enable-batch-process`` configuration option if the low-level library supports both batch and per-stream modes.

The low-level capabilities also include support for passing `the past-frame data`, which includes the object tracking data generated in the past frames but not reported as output yet. This can be the case when the low-level tracker stores the object tracking data generated in the past frames only internally because of, say, low tracking confidence, but later decided to report due to, say, increased confidence. If the past-frame data is retrieved from the low-level tracker, it would be reported as a `user-meta`, called ``NvDsPastFrameObjBatch``. This can be enabled by the ``enable-past-frame`` configuration option.

The plugin accepts NV12- or RGBA-formatted frame data from the upstream component and scales (and/or converts) the input buffer to a buffer in the tracker plugin based on the format required by the low-level library, with the frame resolution specified by ``tracker-width`` and ``tracker-height`` in the configuration file’s ``[tracker]`` section. The path to the low-level tracker library is to be specified via ``ll-lib-file`` configuration option in the same section. The low-level library to be used may also require its own configuration file, which can be specified via ``ll-config-file`` option. If ``ll-config-file`` is not specified, the low-level tracker library may proceed with its default parameter values. The reference low-level tracker implementations provied by the ``NvMultiObjectTracker`` library support different tracking algorithms:

* **IOU Tracker**: The Intersection-Over-Union (IOU) tracker uses the IOU values among the detector’s bounding boxes between the two consecutive frames to perform the association between them or assign a new target ID if no match found. This tracker includes a logic to handle false positives and false negatives from the object detector; however, this can be considered as the bare-minimum object tracker, which may serve as a baseline only.
* **NvSORT**: The NvSORT tracker is the NVIDIA®-enhanced Simple Online and Realtime Tracking (SORT) algorithm. It uses a cascaded data association based on bounding box (bbox) proximity for associating bboxes over consecutive frames and applies a Kalman filter to update the target states. It is computationally efficient since it does not involve any pixel data processing.
* **NvDeepSORT**: The NvDeepSORT tracker is the NVIDIA®-enhanced Online and Realtime Tracking with a Deep Association Metric (DeepSORT) algorithm, which uses the deep cosine metric learning with a Re-ID neural network for data association of multiple objects over frames. This implementation allows users to use any Re-ID network as long as it is supported by NVIDIA's TensorRT™ framework.
* **NvDCF**: The NvDCF tracker is an online multi-object tracker that employes a discriminative correlation filter for visual object tracking, which allows indepedent object tracking even when detection results are not available. It uses the combination of the correlation filter responses and bounding box proximity for data association.

|

More details on each algorithm and its implementation details can be found in `NvMultiObjectTracker : A Reference Low-Level Tracker Library`_ section.

.. image:: /content/DS_plugin_gst-nvtracker_6.0GA.png
         :align: center
         :alt: Gst-nvtracker

Inputs and Outputs
--------------------
This section summarizes the inputs, outputs, and communication facilities of the Gst-nvtracker plugin.

* Input

  * Gst Buffer (as a frame batch from available source streams)
  * ``NvDsBatchMeta``

|

More details about `NvDsBatchMeta <https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_plugin_metadata.html?highlight=nvdsbatchmeta/>`_ can be found in the link. The color formats supported for the input video frame by the NvTracker plugin are NV12 and RGBA.

* Output

  •	Gst Buffer (provided as an input)
  •	``NvDsBatchMeta`` (with addition of tracked object coordinates, tracker confidence and object IDs in ``NvDsObjectMeta``)

.. note::
   If the tracker algorithm does not generate confidence value, then tracker confidence value will be set to the default value (i.e., ``1.0``) for tracked objects. For IOU, NvSORT and NvDeepSORT trackers, ``tracker_confidence`` is set to ``1.0`` as these algorithms do not generate confidence values for tracked objects. NvDCF tracker, on the other hand, generates confidence for the tracked objects due to its visual tracking capability, and its value is set in ``tracker_confidence`` field in ``NvDsObjectMeta`` structure.

   Note that there are separate parameters in ``NvDsObjectMeta`` for detector's confidence and tracker's confidence, which are ``confidence`` and ``tracker_confidence``, respectively. More details can be found in `New metadata fields <https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_plugin_metadata.html?highlight=tracker_confidence#new-metadata-fields>`_

The following table summarizes the features of the plugin.

.. csv-table:: Gst-nvtracker plugin features
     :file: ../text/tables/Gst-nvtracker tables/DS_Plugin_gst-nvtracker_features.csv
     :widths: 20, 30, 10
     :header-rows: 1

.. _NvTracker-Gst-Properties:

Gst Properties
---------------
The following table describes the Gst properties of the Gst-nvtracker plugin.

.. csv-table:: Gst-nvtracker plugin Gst Properties
     :file: ../text/tables/Gst-nvtracker tables/DS_Plugin_gst-nvtracker_gst-properties.csv
     :widths: 15, 25, 10, 15
     :header-rows: 1


NvDsTracker API for Low-Level Tracker Library
----------------------------------------------------
A low-level tracker library can be implemented using the API defined in ``sources/includes/nvdstracker.h``. Parts of the API refer to ``sources/includes/nvbufsurface.h``. The names of API functions and data structures are prefixed with ``NvMOT``, which stands for NVIDIA Multi-Object Tracker. Below is the general flow of the API from a low-level library's perspective:

1. The first required function is:

   .. code-block:: c

     NvMOTStatus NvMOT_Query (
          uint16_t customConfigFilePathSize,
          char* pCustomConfigFilePath,
          NvMOTQuery *pQuery
     );

   The plugin uses this function to query the low-level library’s capabilities and requirements before it starts any processing sessions (i.e., contexts) with the library. Queried properties include the input frame's color format (e.g., RGBA or NV12), memory type (e.g., |NVIDIA®| |CUDA®| device or CPU-mapped NVMM), and support for batch processing.

    .. |NVIDIA®| replace:: NVIDIA\ :sup:`®`

    .. |CUDA®| replace:: CUDA\ :sup:`®`

   The plugin performs this query once during initialization stage, and its results are applied to all contexts established with the low-level library. If a low-level library configuration file is specified, it is provided in the query for the library to consult.
   The query reply structure, ``NvMOTQuery``, contains the following fields:

    * ``NvMOTCompute computeConfig``: Report compute targets supported by the library. The plugin currently only echoes the reported value when initiating a context.
    * ``uint8_t numTransforms``: The number of color formats required by the low-level library. The valid range for this field is ``0`` to ``NVMOT_MAX_TRANSFORMS``. Set this to ``0`` if the library does not require any visual data.

       .. note::
           ``0`` does not mean that untransformed data will be passed to the library.

    * ``NvBufSurfaceColorFormat colorFormats[NVMOT_MAX_TRANSFORMS]``: The list of color formats required by the low-level library. Only the first ``numTransforms`` entries are valid.
    * ``NvBufSurfaceMemType memType``: Memory type for the transform buffers. The plugin allocates buffers of this type to store color- and scale-converted frames, and the buffers are passed to the low-level library for each frame.
      The support is currently limited to the following types:

      dGPU: ::

           NVBUF_MEM_CUDA_PINNED
           NVBUF_MEM_CUDA_UNIFIED

      Jetson: ::

           NVBUF_MEM_SURFACE_ARRAY

    * ``bool supportBatchProcessing``: True if the low-level library supports the batch processing across multiple streams; otherwise false.
    * ``bool supportPastFrame``: True if the low-level library supports outputting the past-frame data; otherwise false.

2. After the query, and before any frames arrive, the plugin must initialize a context with the low-level library by calling:

   .. code-block:: c

     NvMOTStatus NvMOT_Init (
          NvMOTConfig *pConfigIn,
          NvMOTContextHandle *pContextHandle,
          NvMOTConfigResponse *pConfigResponse
     );

  The context handle is opaque outside the low-level library. In the batch processing mode, the plugin requests a single context for all input streams. In per-stream processing mode, on the other hand, the plugin makes this call for each input stream so that each stream has its own context.
  This call includes a configuration request for the context. The low-level library has an opportunity to:

   * Review the configuration and create a context only if the request is accepted. If any part of the configuration request is rejected, no context is created, and the return status must be set to ``NvMOTStatus_Error``. The ``pConfigResponse`` field can optionally contain status for specific configuration items.
   * Pre-allocate resources based on the configuration.

  .. note::
     * In the ``NvMOTMiscConfig`` structure, the ``logMsg`` field is currently unsupported and uninitialized.
     * The ``customConfigFilePath`` pointer is only valid during the call.

3. Once a context is initialized, the plugin sends frame data along with detected object bounding boxes to the low-level library whenever it receives such data from upstream. It always presents the data as a batch of frames, although the batch can contain only a single frame in per-stream processing contexts. Note that depending on the frame arrival timings to the tracker plugin, the composition of frame batches could either be a `full batch` (that contains a frame from every stream) or a `partial batch` (that contains a frame from only a subset of the streams). In either case, each batch is guaranteed to contain `at most one frame` from each stream.

  The function call for this processing is:

   .. code-block:: c

     NvMOTStatus NvMOT_Process (
          NvMOTContextHandle contextHandle,
          NvMOTProcessParams *pParams,
          NvMOTTrackedObjBatch *pTrackedObjectsBatch
     );

  , where:

  * ``pParams`` is a pointer to the input batch of frames to process. The structure contains a list of one or more frames, with at most one frame from each stream. Thus, no two frame entries have the same ``streamID``. Each entry of frame data contains a list of one or more buffers in the color formats required by the low-level library, as well as a list of object attribute data for the frame. Most libraries require at most one-color format.
  * ``pTrackedObjectsBatch`` is a pointer to the output batch of object attribute data. It is pre-populated with a value for ``numFilled``, which is the same as the number of frames included in the input parameters.
  * If a frame has no output object attribute data, it is still counted in ``numFilled`` and is represented with an empty list entry (``NvMOTTrackedObjList``). An empty list entry has the correct ``streamID`` set and numFilled set to ``0``.

      .. note::
         The output object attribute data ``NvMOTTrackedObj`` contains a pointer to the detector object (provied in the input) that is associated with a tracked object, which is stored in ``associatedObjectIn``. You must set this to the associated input object only for the frame where the input object is passed in. For a pipeline with PGIE ``interval=1``, for example:

          * Frame 0: ``NvMOTObjToTrack`` ``X`` is passed in. The tracker assigns it ID 1, and the output object's ``associatedObjectIn`` points to ``X``.
          * Frame 1: Inference is skipped, so there is no input object from detector to be associated with. The tracker finds Object 1, and the output object's ``associatedObjectIn`` points to ``NULL``.
          * Frame 2: ``NvMOTObjToTrack`` ``Y`` is passed in. The tracker identifies it as Object 1. The output Object 1 has ``associatedObjectIn`` pointing to ``Y``.

4. Depending on the capability of the low-level tracker, there could be some tracked object data generated in the past frames but stored only internally without being reported due to, say, a low confidence in the past frames, while it is still being tracked in the background. If it becomes more confident in the later frames and ready to report them, then those past-frame data can be retrieved from the tracker plug-in using the following function call. The past-frame data can be retrieved from the low-level library and outputted to ``batch_user_meta_list`` in ``NvDsBatchMeta`` as a user-meta:

   .. code-block:: c

     NvMOTStatus NvMOT_ProcessPast (
          NvMOTContextHandle contextHandle,
          NvMOTProcessParams *pParams,
          NvDsPastFrameObjBatch *pPastFrameObjBatch
     );

  where:

  *	``pParams`` is a pointer to the input batch of frames to process. This structure is needed to check the list of stream ID in the batch.
  *	``pPastFrameObjBatch`` is a pointer to the output batch of object attribute data generated in the past frames. The data structure ``NvDsPastFrameObjBatch`` is defined in ``include/nvds_tracker_meta.h``. It may include a set of tracking data for each stream in the input. For each object, there could be multiple past-frame data if the tracking data is stored for multiple frames for the object.

5. In case that a video stream source is removed on the fly, the plugin calls the following function so that the low-level tracker library can remove it as well. Note that this API is optional and valid only when the batch processing mode is enabled, meaning that it will be executed only when the low-level tracker library has an actual implementation for the API. If called, the low-level tracker library can release any per-stream resource that it may be allocated:

   .. code-block:: c

     void NvMOT_RemoveStreams (
          NvMOTContextHandle contextHandle,
          NvMOTStreamId streamIdMask
     );

6. When all processing is complete, the plugin calls this function to clean up the context and deallocate its resources:

   .. code-block:: c

     void NvMOT_DeInit (NvMOTContextHandle contextHandle);


*NvMultiObjectTracker* : A Reference Low-Level Tracker Library
-----------------------------------------------------------------
Multi-object tracking (MOT) is a key building block for a large number of intelligent video analytics (IVA) applications where analyzing the temporal changes of objects’ states is required. Given a set of detected objects from the Primary GIE (PGIE) module on a single or multiple streams and with the APIs defined to work with the tracker plugin, the low-level tracker library is expected to carry out actual multi-object tracking operations to keep persistent IDs to the same objects over time.

DeepStream SDK (from v6.0) provides a single reference low-level tracker library, called `NvMultiObjectTracker`, that implements all three low-level tracking algorithms (i.e., IOU, NvSORT, NvDeepSORT, and NvDCF) in a unified architecture. It supports multi-stream, multi-object tracking in the batch processing mode for efficient processing on both CPU and GPU. The following sections will cover the unified tracker architecture and the details of each reference tracker implementation.

Unified Tracker Architecture for Composable Multi-Object Tracker
----------------------------------------------------------------------

Different multi-object trackers share common modules when it comes to basic functionalities (e.g., data association, target management, and state estimation), while differing in other core functionalities (e.g., visual tracking for NvDCF and deep association metric for NvDeepSORT). The `NvMultiObjectTracker` low-level tracker library employs a unified architecture to allow the `composition` of an object tracker through configuration by enabling only the modules required for a particular object tracker. The IOU tracker, for example, requires a minimum set of modules that consist of data association and target management modules. On top of that, NvSORT adds a state estimator for more accurate motion prediction, and NvDeepSORT further introduces a deep Re-ID network to integrate appearance information into data association. Instead of the deep neural network-based Re-ID features in NvDeepSORT, NvDCF employs a DCF-based visual tracking module that uses conventional feature descriptors for more efficient tracking. However, NvDCF can still allow the use of Re-ID module for target re-association for longer-term robustness.

The table below summarizes what modules are used to compose each object tracker, showing what modules are shared across different object trackers and how each object tracker differs in composition:

+--------------+------------+-------------+---------+---------------------+----------------------------------+
|              | State      | Target      | Visual  | Target              |     Data Association Metric      |
| Tracker Type |            |             |         | Re-Association      |                                  |
|              | Estimator  | Management  | Tracker +-----------+---------+------------+-----------+---------+
|              |            |             |         | Spatio-   |  Re-ID  |  Proximity |Visual     |  Re-ID  |
|              |            |             |         | temporal  |         |  & Size    |Similarity |         |
+==============+============+=============+=========+===========+=========+============+===========+=========+
| IOU          |            |     O       |         |           |         |     O      |           |         |
+--------------+------------+-------------+---------+-----------+---------+------------+-----------+---------+
| NvSORT       |    O       |     O       |         |           |         |     O      |           |         |
+--------------+------------+-------------+---------+-----------+---------+------------+-----------+---------+
| NvDeepSORT   |    O       |     O       |         |           |         |     O      |           |    O    |
+--------------+------------+-------------+---------+-----------+---------+------------+-----------+---------+
| NvDCF        |    O       |     O       |    O    |     O     |    O    |     O      |    O      |         |
+--------------+------------+-------------+---------+-----------+---------+------------+-----------+---------+


By enabling the required modules in a config file, each object tracker can be composed due to the unified architecture. In the following sections, we will first see the general work flow of the NvMultiObjectTracker library and its core modules, and then each type of object trackers in more details with explanations on the config params in each module.

Workflow and Core Modules in The *NvMultiObjectTracker* Library
-------------------------------------------------------------------------

The input to a low-level tracker library consists of (1) a batch of video frames from a single or multiple streams and (2) a list of detector objects for each video frame. If the detection interval (i.e., ``interval`` in Primary GIE section) is set larger than 0, the input data to the low-level tracker would have the detector object data only when the inferencing for object detection is performed for a video frame batch (i.e., the *inferenced* frame batch). For the frame batches where the inference is skipped (i.e., the *uninferenced* frame batch), the input data would include only the video frames.

  .. note::
       * A *detector object* refers to an object that is detected by the detector in PGIE module, which is provided to the multi-object tracker module as an input.
       * A *target* refers to an object that is being tracked by the object tracker.
       * An *inferenced* frame is a video frame where an inference is carried out for object detection. Since the inference interval can be configured in setting for PGIE and can be larger than zero, the ``frameNum`` of two consecutive inferenced frames may not be contiguous.

For carrying out multi-object tracking operations with the given input data, below are the essential functionalities to be performed. Multithreading is deployed to optimize their performance on CPU.

  * `Data association` between the detector objects from a new video frame and the existing targets for the same video stream
  * `Target management` based on the data association results, including the target state update and the creation and termination of targets

Depending on the tracker types, there could be some addition processing before data association. For example, NvDeepSORT extracts Re-ID features from all the detector objects and computes the similarity, while NvDCF performs the visual tracker based localization so the targets' predicted locations in a new frame can be used for data association. More details will be covered in each tracker's section.

Data Association
^^^^^^^^^^^^^^^^^^^^^^^^^
For data association, various types of similarity metrics are used to calculate the matching score between the detector objects and the existing targets, including:

* Location similarity (i.e., proximity)
* Bounding box size similarity
* Re-ID feature similarity  (specific to NvDeepSORT tracker)
* Visual appearance similarity (specific to NvDCF tracker)

|

For the proximity between detector objects and targets, IOU is a typical metric that is widely used, but it also depends on the size similarity between them. The similarity of the box size between two objects can be used explicitly, which is calculated as the ratio of the size of the smaller box over the larger one.

The total association score for a pair of detector object and target is the weighted sum of all the metrics:

 .. math::
       totalScore=w_1*IOU+w_2*sizeSimilarity+w_3*reidSimilarity+w_4*visualSimilarity

where :math:`w_i` is the weight for each metric set in config file. Users can also set a minimum threshold for each similarity and the total score.

During the matching, a detector object is associated with a target that belongs to the same class by default to minimize the false matching. However, this can be disabled by setting ``checkClassMatch: 0``, allowing objects can be associated regardless of their object class IDs. This can be useful when employing a detector like YOLO, which can detect many classes of objects, where there could be false classification on the same object over time.

Regarding the matching algorithm, users can set ``associationMatcherType`` as ``0`` to employ an efficient greedy algorithm for optimal bipartite matching with similarity metrics defined above, or ``1`` for a newly introduced method named `cascaded data association` for higher accuracy.
The cascaded data association consists of multi-stage matching, assigning different priorities and similarity metrics based on detection and target confidence. Detector objects are split into two sets, confirmed (confidence between [``tentativeDetectorConfidence``, 1.0]) and tentative (confidence between [``minDetectorConfidence``, ``tentativeDetectorConfidence``]). Then three stage matching are performed sequentially:

* Confirmed detections and validated (both active and inactive) targets
* Tentative detections and active targets left
* Confirmed detections left and tentative targets

|

The first stage uses the joint-similarity metrics defined above, while the later two stages only considers the IOU similarity, because proximity can be a more reliable metric than visual similarity or Re-ID when the detection confidence is low due to, say, partial occlusions or noise. Each stage takes different sets of bboxes as candidates and uses the efficient greedy algorithm for matching. The matched pairs are produced from each stage and combined together.

The output of the data association module consists of three sets of objects/targets:

* The unmatched detector objects
* The matched pairs of the detector objects and the existing targets
* The unmatched targets

|

The unmatched detector objects are among the objects detected by a PGIE detector, yet not associated with any of the existing targets. An unmatched detector object is considered as a newly observed object that needs to be tracked, unless they are determined to be duplicates to any of the existing target. If the maximum IOU score of a new detector object to any of the existing targets is lower than ``minIouDiff4NewTarget``, a new target tracker would be created to track the object since it is not a duplicate to an existing target.

Target Management and Error Handling
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Although a new object is detected by the detector (i.e., a detector object), there is a possibility that this may be a false positive. To suppress such noise in detection, the `NvMultiObjectTracker` tracker library employs a technique called **Late Activation**, where a newly detected object is examined for a period of time and activated for long-term tracking `only if` it survives such a period. To be more specific, whenever a new object is detected, a new tracker is created to track the object, but the target is initially put into the `Tentative` mode, which is a probationary period, whose length is defined by ``probationAge`` under ``TargetManagement`` section of the config file. During this probationary period, the tracker output will not be reported to the downstream, since the target is not validated yet; however, those unreported tracker output data (i.e., `the past-frame data`) are stored within the low-level tracker for later report.

  .. note::
       To allow the low-level tracker library to store and report the past-frame data, user would need to set ``enable-past-frame=1`` and ``enable-batch-process=1`` under ``[tracker]`` section in the deepstream-app config file. Note that the past-frame data is only supported in the batch processing mode.

The same target may be detected for the next frame; however, there could be `false negative` by the detector (i.e., missed detection), resulting in a unsuccessful data association to the target. The NvMultiObjectTracker library employs another technique called **Shadow Tracking**, where a target is still being tracked in the background for a period of time even when the target is *not* associated with a detector object. Whenever a target is not associated with a detector object for a given time frame, an internal variable of the target called `shadowTrackingAge` is incremented. Once the target is associated with a detector object, then `shadowTrackingAge` will be reset to zero.

If the target is in the Tentative mode and the `shadowTrackingAge` reaches ``earlyTerminationAge`` specified in the config file, the target will be terminated prematurely (which is referred to as **Early Termination**). If the target is not terminated during the Tentative mode and successfully assocciated with a detector object, the target is *activated* and put into the `Active` mode, starting to report the tracker outputs to the downstream. If the past-frame data is enabled, the tracked data during the Tentative mode will be reported as well, since they were not reported yet. Once a target is activated (i.e., in Active mode), if the target is not associated for a given time frame (or the tracker confidence gets lower than a threshold), it will be put into the `Inactive` mode, and its `shadowTrackingAge` will be incremented, yet still be tracked in the background. However, the target will be terminated if the `shadowTrackingAge` exceeds ``maxShadowTrackingAge``.

The state transitions of a target tracker are summarized in the following diagram:

.. image:: /content/DS_NvMultiObjectTracker_state_transition.png
         :align: center
         :alt: Gst-nvtracker

The NvMultiObjectTracker library can generate a unique ID to some extent. If enabled by setting ``useUniqueID: 1``, each video stream will be assigned a 32-bit long random number during the initialization stage. All the targets created from the same video stream will have the same upper 32-bit of the ``uint64_t``-type target ID set by the per-stream random number. In the meantime, the lower 32-bit of the target ID starts from 0. The randomly generated upper 32-bit number allows the target IDs from a particular video stream to increment from a random position in the possible ID space. If disabled (i.e., ``useUniqueID: 0``, which is the default value), both the upper and lower 32-bit will start from 0, resulting in the target ID to be incremented from 0 for every run.

Note that the incrementation of the lower 32-bit of the target ID is done across the whole video streams in the same NvMultiObjectTracker library instantiation. Thus, even if the unique ID generation is disabled, the tracker IDs will be unique for the same pipeline run. If the unique ID generation is disabled, and if there are three objects for Stream 1 and two objects for Stream 2, for example, the target IDs will be assigned from 0 to 4 (instead of 0 to 2 for Stream 1 and 0 to 1 for Stream 2) as long as the two streams are being processed by the same library instantiation.

``preserveStreamUpdateOrder`` controls whether to use single or multiple threads to update targets. If it is enabled, new IDs are generated sequentially following input stream ID order in each batch using a single thread, i.e. the objects for Stream 1 and 2 will have IDs from 0 to 2 and 3 to 4 respectively. By default, this option is disabled so target management is done with multi-threads to enable better performance but the ID order is not preserved.

The NvMultiObjectTracker library `pre-allocates` all the GPU memories during initialization based on:

* The number of streams to be processed
* The maximum number of objects to be tracked per stream (denoted as ``maxTargetsPerStream``)

|

Thus, the CPU/GPU memory usage by the NvMultiObjectTracker library is almost linearly proportional to the total number of objects being tracked, which is `(number of video streams) × (maxTargetsPerStream)`, except the scratch memory space used by dependent libraries (such as cuFFT™, TensorRT™, etc.). Thanks to the pre-allocation of all the necessary memory, the NvMultiObjectTracker library is not expected to have memory growth during long-term run even when the number of objects increases over time.

Once the number of objects being tracked reaches the configured maximum value (i.e., ``maxTargetsPerStream``), any new objects will be discarded until some of the existing targets are terminated. Note that the number of objects being tracked includes the targets that are being tracked in the shadow tracking mode. Therefore, NVIDIA recommends that users set ``maxTargetsPerStream`` large enough to accommodate the maximum number of objects of interest that may appear in a frame, as well as the objects that may have been tracked from the past frames in the shadow tracking mode.

The ``minDetectorConfidence`` property under ``BaseConfig`` section in a low-level tracker config file sets the confidence level below which the detector objects are filtered out.

State Estimation
^^^^^^^^^^^^^^^^^^^^^^^^^

The NvMultiObjectTracker library employs two types of state estimators, both of which are based on Kalman Filter (KF): `Simple KF` and `Regular KF`. The Simple KF has ``6`` states defined, which are ``{x, y, w, h, dx, dy}``, where ``x`` and ``y`` indicate the coordinates of the top-left corner of a target bbox, while ``w`` and ``h`` the width and the height of the bbox, respectively. ``dx`` and ``dy`` denote the velocity of ``x`` and ``y`` states. The Regular KF, on the other hand, have ``8`` states defined, which are ``{x, y, w, h, dx, dy, dw, dh}``, where ``dw`` and ``dh`` are the velocity of ``w`` and ``h`` states and the rest is the same as the Simple KF. Both types of Kalman Filters employ a constant velocity model for generic use. The measurement vector is defined as ``{x, y, w, h}``. Furthermore, there is an option to use bbox aspect ratio ``a`` and its velocity ``da`` instead of ``w`` and ``dw`` when ``useAspectRatio`` is enabled, which is specially used by NvDeepSORT. In case the state estimator is used for a generic use case (like in the NvDCF tracker), the process noise variance for ``{x, y}``, ``{w, h}``, and ``{dx, dy, dw, dh}`` can be configured by ``processNoiseVar4Loc``, ``processNoiseVar4Size``, and ``processNoiseVar4Vel``, respectively.

When a visual tracker module is enabled (like in the NvDCF tracker), there could be two different measurements from the state estimator’s point of view: (1) the bbox from the detector at PGIE and (2) the bbox from the tracker's localization. This is because the NvDCF tracker module is capable of localizing targets using its own learned filter. The measurement noise variance for these two different types of measurements can be configured by ``measurementNoiseVar4Detector`` and ``measurementNoiseVar4Tracker``. These parameters are expected to be tuned or optimized based on the detector's and the tracker's characteristics for better measurement fusion.

The usage of the state estimator in the NvDeepSORT tracker slightly differs from that for the aforementioned generic use case in that it is basically a *Regular KF*, yet with a couple of differences as per the original paper and the implementation (Check the references in `NvDeepSORT Tracker`_ section):

*  Use of the aspect ratio ``a`` and the height ``h`` (instead of ``w`` and ``h``) to estimate the bbox size
*  The process and measurement noises that are proportional to the bounding box height (instead of constant values)

To allow these differences, the state estimator module in the NvMultiObjectTracker library has a set of additional config parameters:

* ``useAspectRatio`` to enable the use of ``a`` (instead of ``w``)
* ``noiseWeightVar4Loc`` and ``noiseWeightVar4Vel`` as the proportion coefficients for the measurement and velocity noise, respectively

Note that if these two parameters are set, the fixed process noise and measurement noise parameters for the generic use cases will be ignored.

Object Re-Identification
^^^^^^^^^^^^^^^^^^^^^^^^^

Re-identification (Re-ID) uses TensorRT™-accelerated deep neural networks to extract unique feature vectors from detected objects that are robust to spatial-temporal variance and occlusion. It has two usecases in *NvMultiObjectTracker*: (1) In NvDeepSORT, the Re-ID similarity is used for data association of objects over consecutive frames.; (2) In target re-association (which will be described in more detail in the following section), the Re-ID features of targets are extracted and kept, so that they can be used for re-association with the same target if they are seemingly lost. ``reidType`` selects the mode for each aforementioned usecase.

In the Re-ID module, the detector objects are cropped and resized into the configured input size of the Re-ID network. The parameter ``keepAspc`` controls whether the object's aspect ratio is preserved after cropping. Then NVIDIA TensorRT™ creates an engine from the network, which processes the input in batches and outputs a fixed-dimensional vector with L2 norm being equal to 1 for each detector object as the Re-ID feature. For each target, a gallery of its Re-ID features in most recent frames are kept internally. The size of the feature gallery can be set by ``reidHistorySize``.

 .. note::
       Re-ID is by default enabled in ``config_tracker_NvDeepSORT.yml`` and ``config_tracker_NvDCF_accuracy.yml``. Users need to follow instructions in `Setup Official Re-ID Model`_ to setup a sample Re-ID model, or check `Customize Re-ID Model`_ for more information on adding a custom Re-ID model for object tracking with different architectures and datasets.

The Re-ID similarity between a detector object and a target is the cosine similarity between the detector object's Re-ID feature and its nearest neighbor in the target's featue gallery, whose value is in range ``[0.0, 1.0]``. Specifically, each Re-ID feature in the target's gallery takes the dot product with the detector object's Re-ID feature. The maximum of all the dot products is the similarity score, i.e.

 .. math::
       score_{ij}=\max_{k}(feature\_det_{i}\cdot feature\_track_{jk})

where:

* :math:`\cdot` denotes the dot product.
* :math:`feature\_det_{i}` denotes the i-th detector object's feature.
* :math:`feature\_track_{jk}` denotes the k-th Re-ID feature in the j-th target's feature gallery. :math:`k` =[1, ``reidHistorySize``].

Target Re-Association
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
In DeepStream SDK 6.2, the *target re-association* algorithm is enhanced with joint Re-ID and motion features for higher accuracy. This is to address a common problem that occurs in the situation where objects undergo partial occlusions to full occlusions in a gradual manner. During this course of action, the detector at PGIE module may capture only some part of the objects (due to partial visibility), resulting in ill-sized, ill-centered boxes on the target. Later, the target cannot be associated with the object appearing again due to the size and location prediction errors, potentially causing tracking failures and ID switches. Such a re-association problem can typically be handled as a post-processing; however, for real-time analytics applications, this is often expected to be handled seamlessly as a part of the real-time multi-object tracking.

The target re-association takes advantage of the *Late Activation* and *Shadow Tracking* in target management module. It tries to associate the newly-appeared targets with previously lost targets based on motion and Re-ID similarity in a seamless, real-time manner by the following steps:

**Tracklet Prediction**: Whenever an existing target is not associated with a detector object for a prolonged period (same as ``probationAge``), it is considered that the target is lost. While the visual tracker module keeps track of the target in the shadow tracking mode, a length of the predicted tracklet (configured by ``trajectoryProjectionLength``) is generated using some of the recently matched tracklet points (whose length is set by ``prepLength4TrajectoryProjection``) and stored into an internal database until it is matched again with a detector object or re-associated with another target.

**Re-ID Feature Extraction**: Before a target is lost, the Re-ID network extracts its Re-ID feature with the frame interval of ``reidExtractionInterval`` and stores them in the feature gallery. These features will be used to identify target re-appearance in the tracklet matching stage.

**Target ID Acquisition**: When a new target is instantiated, its validity is examined for a few frames (i.e., ``probationAge``) and a target ID is assigned only if validated (i.e., Late Activation), after which the target state report starts. During the target ID acquisition, the new target is examined if it matches with one of the predicted `tracklets` from the existing targets in the internal database where the aforementioned predicted `tracklets` are stored. If matched, it would mean that the new target is actually the re-appearance of a disappeared target in the past. Then, the new target is re-associated with the existing target and its `tracklet` is fused into that as well. Otherwise, a new target ID is assigned.

**Tracklet Matching**: During the `tracklet` matching process in the previous step, the valid candidate `tracklets` are queried from the database based on the feasible time window configured by ``maxTrackletMatchingTimeSearchRange``. For the new target and each candidate, both the motion and Re-ID similarity are taken into account for tracklet matching. The motion similarity is the average IOU along the `tracklet` with various criteria including the minimum average IOU score (i.e., ``minTrackletMatchingScore``), maximum angular difference in motion (i.e., ``maxAngle4TrackletMatching``), minimum speed similarity (i.e., ``minSpeedSimilarity4TrackletMatching``), and minimum bbox size similarity (i.e., ``minBboxSizeSimilarity4TrackletMatching``) computed by a Dynamic Time Warping (DTW)-like algorithm. The Re-ID similarity is the cosine distance between the new target's Re-ID feature and its nearest neighbor in the candidate's feature gallery. The total similarity score is the weighted sum of both metrics:

 .. math::
       totalScore=w_1*IOU+w_2*reidSimilarity

where :math:`w_i` is the weight for each metric set in config file. Users can also set a minimum threshold for each similarity and the total score.

**Tracklet Fusion**: Once two `tracklets` are associated, they are fused together to generate one smooth `tracklet` based on the matching status with detector and the confidence at each point.

``config_tracker_NvDCF_accuracy.yml`` provides an example to enable this feature. Since Re-ID is computationally expensive, users can increase ``reidExtractionInterval`` to improve performance or set the parameters below to use motion-only target re-association without Re-ID.

   .. code-block:: yaml

     TrajectoryManagement:
       useUniqueID: 0      # Use 64-bit long Unique ID when assignining tracker ID. Default is [true]
       enableReAssoc: 1    # Enable Re-Assoc
       minMatchingScore4Overall: 0    # min matching score for overall
       minTrackletMatchingScore: 0.5644    # min tracklet similarity score for re-assoc
       matchingScoreWeight4TrackletSimilarity: 1.0    # weight for tracklet similarity score
       minTrajectoryLength4Projection: 36    # min trajectory length required to make projected trajectory
       prepLength4TrajectoryProjection: 50    # the length of the trajectory during which the state estimator is updated to make projections
       trajectoryProjectionLength: 94    # the length of the projected trajectory
       maxAngle4TrackletMatching: 106    # max angle difference for tracklet matching [degree]
       minSpeedSimilarity4TrackletMatching: 0.0967    # min speed similarity for tracklet matching
       minBboxSizeSimilarity4TrackletMatching: 0.5577    # min bbox size similarity for tracklet matching
       maxTrackletMatchingTimeSearchRange: 20    # the search space in time for max tracklet similarity
       trajectoryProjectionProcessNoiseScale: 0.0100    # trajectory projector's process noise scale w.r.t. state estimator
       trajectoryProjectionMeasurementNoiseScale: 100    # trajectory projector's measurement noise scale w.r.t. state estimator
       trackletSpacialSearchRegionScale: 0.2598    # the search region scale for peer tracklet

     ReID:
       reidType: 0    # The type of reid among { DUMMY=0, NvDEEPSORT=1, Reid based reassoc=2, both NvDEEPSORT and reid based reassoc=3}

 .. note::
       Target re-association can be effective only when the state estimator is enabled, otherwise the `tracklet` prediction will not be made properly. The parameters provided above is tuned for PeopleNet v2.6, and it may not work as expected for other types of detectors.

Bounding-box Unclipping
^^^^^^^^^^^^^^^^^^^^^^^^^
Another small experimental feature is the bounding box unclipping. If a target is fully visible within the field-of-view (FOV) of the camera but starts going out of the FOV, the target would be partially visible and the bounding box (i.e., `bbox`) may capture only a part of the target (i.e., clipped by the FOV) until it fully exits the scene. If it is expected that the size of the `bbox` doesn't change much around the border of the video frame, the full `bbox` can be estimated beyond the FOV limit using the `bbox` size estimated when the target was fully visible. This feature can be enabled by setting ``enableBboxUnClipping: 1`` under ``TargetManagement`` module in the low-level config file.


Configuration Parameters
^^^^^^^^^^^^^^^^^^^^^^^^^

The following table summarizes the configuration parameters for the common modules in the NvMultiObjectTracker low-level tracker library.


.. csv-table:: Configuration properties in Common Modules in NvMultiObjectTracker low-level tracker library
     :file: ../text/tables/Gst-nvtracker tables/DS_Plugin_gst-nvtracker_NvMultiObjectTracker_CommonModules_config_properties.csv
     :widths: 1, 1, 100, 1, 1
     :header-rows: 1

More details on how to tune these parameters with some samples can be found in :doc:`\DS_plugin_NvMultiObjectTracker_parameter_tuning_guide`.

IOU Tracker
--------------------------

The NvMultiObjectTracker library provides an object tracker that has only the essential and minimum set of functionalities for multi-object tracking, which is called the *IOU* tracker. IOU tracker performs only the following functionalities:

* **Greedy data association** between the detector objects from a new video frame and the existing targets in the previous video frame
* **Target management** based on the data association results including the target state update and the creation and termination of targets

The error handling mechanisms like Late Activation and Shadow Tracking are integral part of the target management module of the NvMultiObjectTracker library; thus, such features are inherently enabled in the IOU tracker.

IOU tracker can be used as a performance baseline as it consumes the minimum amount of computational resources. A sample configuration file ``config_tracker_IOU.yml`` is provided in DeepStream SDK package.

NvSORT Tracker
--------------------------

NvSORT tracker increases the tracking accuracy while maintaining the high performance on top of IOU tracker with the following improvements:

 * **State estimation** with Kalman filter to better estimate and predict the states of the targets in the current frame.
 * **Cascaded data association** to associate targets and detector objects in multiple stages based on their proximity and confidence, which is more accurate than the simple matching in original SORT tracker.

As it fully relies on the bbox attributes for data association, the NvSORT's tracking accuracy is solely attributed to the detection accuracy. With a medium or high accuracy detector, NvSORT produces high quality tracking results with minimal computational resources. A sample configuration file ``config_tracker_NvSORT.yml`` is provided in DeepStream SDK package.

NvDeepSORT Tracker
--------------------------

NvDeepSORT tracker utilizes deep learning based object appearance information for accurate object matching in different frames and locations, resulting in enhanced robustness over occlusions and reduced ID switches. It applies a pre-trained re-identification (Re-ID) neural network to extract a feature vector for each object, compares the similarity between different objects using the extracted feature vector with a cosine distance metric, and combines it with a state estimator to perform the data association over frames. Before running NvDeepSORT, Re-ID model needs to be set up following `Setup Official Re-ID Model`_  and `Customize Re-ID Model`_.

Setup Official Re-ID Model
^^^^^^^^^^^^^^^^^^^^^^^^^^
The official Re-ID model is a 10-layer ResNet trained on the MARS dataset. Scripts and README file for users to setup the model are provided in ``sources/tracker_NvDeepSORT`` for the convenience of the users. The link to the pre-trained Re-ID model can be found in the **Installation** section in the original DeepSORT `GitHub <https://github.com/nwojke/deep_sort>`_. Once the model is found, users are advised to do the following:

* Download the Re-ID model ``networks/mars-small128.pb`` and place it under ``sources/tracker_NvDeepSORT``.
* Make sure TensorRT's ``uff-converter-tf`` and ``graphsurgeon-tf`` are installed. Then install ``PyYAML``, ``tensorflow-gpu`` for python3.
* Run provided script to remove nodes not supported by TensorRT and convert TensorFlow model into UFF format by ``$ python3 convert.py mars-small128.pb``.
* Set NvDeepSORT config ``config_tracker_NvDeepSORT.yml`` in gst-nvtracker plugin, and make sure ``uffFile`` to match UFF model path.

|

The official model can directly run at FP32 or FP16 precision by setting ``networkMode`` as ``0`` or ``1``. To maximize the performance, INT8 precision inference can be enabled after generating a calibration file with TensorRT. The steps are:

* Collect a list of image patches for TensorRT calibration. They should be resized to the same resolution as network input and provide a representative set of input data. For the official model, over one hundred single person patches with 128x64 resolution need to be provided under ``source/tracker_NvDeepSORT/data/``. Sample patches are like:

          +---------------------------------------------------------------+---------------------------------------------------------------+
          | **0.jpg**                                                     | **1.jpg**                                                     |
          |                                                               |                                                               |
          | .. image:: /content/NvDeepSORT_INT8_Calibration_Sample_0.jpg  | .. image:: /content/NvDeepSORT_INT8_Calibration_Sample_1.jpg  |
          +---------------------------------------------------------------+---------------------------------------------------------------+

* Install dependencies ``pip3 install numpy pycuda Pillow``.
* Update network information in ``config_tracker_NvDeepSORT.yml`` (using batch size 100 for example) as

   .. code-block:: yaml

     ReID:
       networkMode: 2
       modelEngineFile: "/opt/nvidia/deepstream/deepstream/samples/models/Tracker/mars-small128.uff_b100_gpu0_int8.engine"
       calibrationTableFile: "/opt/nvidia/deepstream/deepstream/samples/models/Tracker/calibration.cache"
       ...

* Run provided script to generate calibration table ``$ python3 calibrate.py``.

Data Association
^^^^^^^^^^^^^^^^^^^^^^^^^
For the data association in the NvDeepSORT tracker, there are two metrics are used:

* Proximity
* Re-ID based similarity

|

For the proximity score, the `Mahalanobis` distance between the `i-th` detector object and the `j-th` target is calculated using the target's predicted location and its associated uncertainty:

 .. math::
       dist_{ij}=(D_i-Y_j)^TS_j^{-1}(D_i-Y_j)

where:

* :math:`D_i` denotes the `i-th` detector object in ``{x, y, a, h}`` format.
* :math:`Y_j` denotes the predicted states ``{x', y', a', h'}`` from state estimator for the `j-th` tracker.
* :math:`S_j` denotes the predicted covariance from state estimator for the `j-th` tracker.

|

In the original DeepSORT implementation, the maximum threshold of `Mahalanobis` distance for a valid pair of detector object and target is set ``9.4877``, representing 95% confidence computed from the inverse Chi-square distribution. Note in NvDeepSORT, the value is configured by ``thresholdMahalanobis`` in tracker config to achieve higher accuracy for a particular detector model, such as the PeopleNet v2.6, so it may be different from the value in the original implementation.

After filtering out invalid pairs, the Re-ID similarity score is computed as the maximum cosine similarity between a detector object and a target. Then the cascaded data association algorithm is used for high accuracy multi-stage matching.

Customize Re-ID Model
^^^^^^^^^^^^^^^^^^^^^^^^^
Apart from the Re-ID model provided in the original DeepSORT repository, the provided NvDeepSORT implementation allows users to use a custom Re-ID model of their choice as long as it is in the UFF format and the output of the network for each object is a single vector with unit L2 norm. Then the Re-ID similarity score will be computed based on the cosine metric and used to perform the data association in the same way as the official model. The steps are:

* Train a Re-ID network using deep learning frameworks such as TensorFlow or PyTorch.
* Make sure the network layers are supported by TensorRT and convert the model into UFF format. Mixed precision inference is still supported, and a calibration cache is required for INT8 mode.
* Specify the following parameters in tracker config file based on the properties of the custom model. Then run DeepStream SDK with the new Re-ID model.

  * ``reidFeatureSize``
  * ``reidHistorySize``
  * ``inferDims``
  * ``colorFormat``
  * ``networkMode``
  * ``offsets``
  * ``netScaleFactor``
  * ``inputBlobName``
  * ``outputBlobName``
  * ``uffFile``
  * ``modelEngineFile``

Configuration Parameters
^^^^^^^^^^^^^^^^^^^^^^^^^
A sample config file ``config_tracker_NvDeepSORT.yml`` is provided in DeepStream SDK package. The following table summarizes the configuration parameters for NvDeepSORT.

.. csv-table:: Gst-nvtracker NvDeepSORT low-level tracker configuration properties
     :file: ../text/tables/Gst-nvtracker tables/DS_Plugin_gst-nvtracker_DeepSORT_config_properties.csv
     :widths: 1, 1, 100, 1, 1
     :header-rows: 1

Implementation Details and Reference
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The difference between NvDeepSORT and the original implementation includes:

* For data association, the original implementation sorts the targets in an ascending order based on the tracking age and runs the matching algorithm for each age sequentially, while NvDeepSORT applies the cascaded data association algorithm with higher performance and accuracy.
* NvDeepSORT implementation in the NvMultiObjectTracker library adopts the same target management policy as the NvDCF tracker, which is advanced to the original DeepSORT.
* The cosine distance metric for two features is :math:`score_{ij}=1-feature\_det_{i}\cdot feature\_track_{jk}`, with smaller values representing more similarity. By contrast, NvDeepSORT directly uses dot product for computational efficiency, so larger values means higher similarity.

|

**Reference**: Wojke, Nicolai, Alex Bewley, and Dietrich Paulus. "Simple online and real-time tracking with a deep association metric." `2017 IEEE international conference on image processing (ICIP).` IEEE, 2017. Check `Paper <https://arxiv.org/abs/1703.07402>`_ and `The original implementation on Github <https://github.com/nwojke/deep_sort>`_.


NvDCF Tracker
--------------------------

NvDCF tracker employs a visual tracker that is based on the discriminative correlation filter (DCF) for learning a target-specific correlation filter and for localizing the same target in the next frames using the learned correlation filter. Such correlation filter learning and localization are usually carried out on `per-object` basis in a typical MOT implementation, creating a potentially large number of small CUDA kernel launches when processed on GPU. This inherently poses challenges in maximizing GPU utilization, especially when a large number of objects from multiple video streams are expected to be tracked on a single GPU.

To address such performance issues, the GPU-accelerated operations for the NvDCF tracker are designed to be executed in the *batch processing* mode to maximize the GPU utilization despite the nature of small CUDA kernels in per-object tracking model. The batch processing mode is applied in the entire tracking operations, including the `bbox` cropping and scaling, visual feature extraction, correlation filter learning, and localization. This can be viewed as a similar model to the batched cuFFT or batched cuBLAS calls, but it differs in that the batched MOT execution model spans many operations in a higher level. The batch processing capability is extended from multi-object batching to the batching of multiple streams for even greater efficiency and scalability.

Thanks to its visual tracking capability, the NvDCF tracker can localize and keep track of the targets even when the detector in PGIE misses them (i.e., false negatives) for potentially an extended period of time caused by partial or full occlusions, resulting in more robust tracking. The enhanced robustness characteristics allow users to use a higher ``maxShadowTrackingAge`` value for longer-term object tracking and also allows PGIE's ``interval`` to be higher only at the cost of slight degradation in accuracy.

Unlike NvSORT and NvDeepSORT where the Kalman filter takes the detection bboxes as the only input, the Kalman filter in the NvDCF tracker also takes the localization results from the visual tracking module as an input as well. Once a target is being tracked, the visual tracker keeps trying to localize the same target in the next frames using the learned correlation filter, while there could be matched detector bboxes. The Kalman filter in NvDCF tracker fuses both the DCF-based localization results and the detection bboxes for better target state estimation and prediction.

Visual Tracking
^^^^^^^^^^^^^^^^^^^^^^^^^

For each tracked target, NvDCF tracker defines a search region around its `predicted` location in the next frame large enough for the same target to be detected in the search region. The location of a target on a new video frame is predicted by using the state estimator module. The ``searchRegionPaddingScale`` property determines the size of the search region as a multiple of the diagonal of the target’s bounding box. The size of the search region would be determined as:

  .. math::
       SearchRegion_{width}=w+searchRegionPaddingScale*\sqrt{w*h}

       SearchRegion_{height}=h+searchRegionPaddingScale*\sqrt{w*h}

, where :math:`w` and :math:`h` are the width and height of the target’s bounding box, respectively.


Once the search region is defined for each target at its predicted location, the image patches from each of the search regions are cropped and scaled to a predefined feature image size, from which the visual features are extracted. The ``featureImgSizeLevel`` property defines the size of the feature image, and its range is from 1 to 5. Each level between 1 and 5 corresponds to 12x12, 18x18, 24x24, 36x36, and 48x48, respectively, for each feature channel. A lower value of ``featureImgSizeLevel`` causes NvDCF to use a smaller feature size, increasing GPU performance potentially yet at the cost of accuracy and robustness. Consider the relationship between ``featureImgSizeLevel`` and ``searchRegionPaddingScale`` when configuring the parameters. If ``searchRegionPaddingScale`` is increased while ``featureImgSizeLevel`` is fixed, the number of pixels corresponding to the target itself in the feature images will be effectively decreased.

For each cropped image patch, the visual appearance features such as ColorNames and/or Histogram-of-Oriented-Gradient (HOG) are extracted. The type of visual features to be used can be configured by setting ``useColorNames`` and/or ``useHog``. The HOG features consist of 18 channels based on the number of bins for different orientations, while The ColorNames features have 10 channels. If both features are used (by setting ``useColorNames: 1`` and ``useHog: 1``), the total number of channels would then be 28. Therefore, if one uses both HOG and ColorNames with ``featureImgSizeLevel: 5``, the dimension of visual features that represents a target would be 28x48x48. The more channels of visual features are used, the higher the accuracy would be, but would increase the computational complexity and reduce the performance. The NvDCF tracker uses NVIDIA's `VPI™ <https://docs.nvidia.com/vpi/index.html>`_ library for extracting those visual features.

The correlation filters are generated with an attention window (using a Hanning window) applied at the center of the target `bbox`. Users are allowed to move the center of the attention window in the vertical direction. For example, ``featureFocusOffsetFactor_y: -0.2`` would result in the center of the attention window to be at ``y=-0.2`` in the feature map, where the relative range of the height is ``[-0.5, 0.5]``. Consider that typical surveillance or CCTV cameras are mounted at a moderately high position to monitor a wide area of the environment, say, a retail store or a traffic intersection. From those vantage points, more occlusions can occur at the lower part of the body of persons or vehicles by other persons or vehicles. Moving the attention window up a bit may improve the accuracy and robustness for those use cases.

Once a correlation filter is generated for a target, typical DCF-based trackers usually employ an exponential moving average for temporal consistency when the optimal correlation filter is created and updated over consecutive frames. The learning rate for this moving average can be configured by ``filterLr`` and ``filterChannelWeightsLr`` for the correlation filters and their channel weights, respectively. The standard deviation for Gaussian for the desired response used when creating an optimal DCF filter can also be configured by ``gaussianSigma``.

Data Association
^^^^^^^^^^^^^^^^^^^^^^^^^
The association of target IDs across frames for robust tracking typically entails visual appearance-based similarity matching, for which the visual appearance features are extracted at each candidate location. Usually, this is a computationally expensive process and often plays as a performance bottleneck in object tracking. Unlike existing approaches that extract visual features from all the candidate locations and perform feature matching among all the candidate objects, the NvDCF tracker takes advantage of the correlation response (that is already obtained during target localization stage) as the tracking confidence map of each tracker over a search region and simply looks up the confidence values at each candidate location (i.e., the location of each detector object) to get the visual similarity without any explicit computation. By comparing those confidences between trackers, we can identify which tracker has a higher visual similarity to a particular detector object and use it as a part of the matching score for data association. Therefore, the visual similarity matching in the data association process can be carried out very efficiently through a simple look-up table (LUT) operation on existing correlation responses.

In the animated figure below, the left side shows the target within its search region, while the right side shows the correlation response map (where the deep red color indicates higher confidence and deep blue indicates lower confidence). In the confidence map, the yellow cross (i.e., ``+``) around the center indicates the peak location of the correlation response, while the purple ``x`` indicate the center of nearby detector bboxes. The correlation response values at those purple ``x`` locations indicate the confidence score on how likely the same target exists at that location in terms of the visual similarity.

.. image:: /content/NvDCF_RN10_lvl5_reassoc_CorrResp[187].gif
          :align: center
          :alt: Correlation Response

If there are multiple detector bboxes (i.e., purple ``x``) around the target like the one in the figure below, the data association module will take care of the matching based on the visual similairty score and the configured weight and minimum value, which are ``matchingScoreWeight4VisualSimilarity`` and ``minMatchingScore4VisualSimilarity``, respectively.

.. image:: /content/NvDCF_RN10_lvl5_CorrResp_Target_187_Frame_985.jpg
          :align: center
          :alt: Correlation Response


Configuration Parameters
^^^^^^^^^^^^^^^^^^^^^^^^^

A few sample configuration files for the NvDCF tracker are provided as a part of DeepStream SDK package, which is named as:

* ``config_tracker_NvDCF_max_perf.yml``
* ``config_tracker_NvDCF_perf.yml``
* ``config_tracker_NvDCF_accuracy.yml``

|

The first `max_perf` config file is to configure the NvDCF tracker to consume the least amount of resources, while the second `perf` config file is for the use case where a decent balance between performance and accuracy is required. The last `accuracy` config file is to maximize the accuracy and robustness by enabling most of the features to their full capability, especially the target re-association.

The following table summarizes the configuration parameters used in the config files for the NvDCF low-level tracker (except the common modules and parameters already mentioned in an earlier section).

.. csv-table:: Gst-nvtracker NvDCF low-level tracker configuration properties
     :file: ../text/tables/Gst-nvtracker tables/DS_Plugin_gst-nvtracker_NvDCF_config_properties.csv
     :widths: 1, 1, 100, 1, 1
     :header-rows: 1

To learn more about NvDCF Parameter tuning guide, see :doc:`\DS_plugin_NvMultiObjectTracker_parameter_tuning_guide`.

See also the :ref:`NvDCF-param-troubleshooting-label` section for solutions to common problems in tracker behavior and tuning.


Setup and Visualization of Tracker Sample Pipelines
------------------------------------------------------------------------------------------------
This section describes how to setup tracking by detection pipeline with various NVIDIA® pre-trained detectors and NvDsTracker, and provides ready-to-use config files optimized for high accuracy tracking. Then the visualization of some sample outputs and internal states (such as correlation responses for a few selected targets) are presented to help users to better understand how NvDsTracker works, especially on the visual tracker module.

People Tracking
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
To setup a people tracking pipeline, users can download pre-trained `PeopleNet` model from `NVIDIA NGC catalog <https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/peoplenet>`__ with ResNet-34 backbone. It detects person, bag, and face classes. The following detector and tracker combinations are provided as samples.

PeopleNet + NvSORT
~~~~~~~~~~~~~~~~~~~~~~~~~

This pipeline performs high performance people tracking with reasonable accuracy. Such a ``deepstream-app`` pipeline is constructed with the following components:

* **Detector**: PeopleNet v2.6 (w/ ResNet-34 as backbone)
* **Post-processing** algorithm for object detection: Hybrid clustering (i.e., DBSCAN + NMS)
* **Tracker**: NvSORT with ``config_tracker_NvSORT.yml`` configuration

|

The detector config file used is:

.. raw:: html

   <details>
   <summary><a>config_infer_primary_PeopleNet.txt</a></summary>

.. code-block:: bash

     [property]
     ## model-specific params like paths to model, engine, label files, etc. are to be added by users

     gpu-id=0
     net-scale-factor=0.0039215697906911373
     input-dims=3;544;960;0
     uff-input-blob-name=input_1
     process-mode=1
     model-color-format=0
     ## 0=FP32, 1=INT8, 2=FP16 mode
     network-mode=1
     num-detected-classes=3
     interval=0
     gie-unique-id=1
     output-blob-names=output_cov/Sigmoid;output_bbox/BiasAdd
     ## 1=DBSCAN, 2=NMS, 3= DBSCAN+NMS Hybrid, 4 = None(No clustering)
     cluster-mode=3
     maintain-aspect-ratio=1

     [class-attrs-all]
     pre-cluster-threshold=0.1555
     nms-iou-threshold=0.3386
     minBoxes=2
     dbscan-min-score=1.9224
     eps=0.3596
     detected-min-w=20
     detected-min-h=20

.. raw:: html

   </details>

PeopleNet + NvDeepSORT
~~~~~~~~~~~~~~~~~~~~~~~~~

This pipeline enables people Re-ID capability during tracking. Such a ``deepstream-app`` pipeline is constructed with the following components:

* **Detector**: PeopleNet v2.6 (w/ ResNet-34 as backbone)
* **Post-processing** algorithm for object detection: Hybrid clustering (i.e., DBSCAN + NMS)
* **Tracker**: NvDeepSORT with ``config_tracker_NvDeepSORT.yml`` configuration

|

The detector config file used is:


.. raw:: html

   <details>
   <summary><a>config_infer_primary_PeopleNet.txt</a></summary>

.. code-block:: bash

     [property]
     ## model-specific params like paths to model, engine, label files, etc. are to be added by users

     gpu-id=0
     net-scale-factor=0.0039215697906911373
     input-dims=3;544;960;0
     uff-input-blob-name=input_1
     process-mode=1
     model-color-format=0
     ## 0=FP32, 1=INT8, 2=FP16 mode
     network-mode=1
     num-detected-classes=3
     interval=0
     gie-unique-id=1
     output-blob-names=output_cov/Sigmoid;output_bbox/BiasAdd
     ## 1=DBSCAN, 2=NMS, 3= DBSCAN+NMS Hybrid, 4 = None(No clustering)
     cluster-mode=3
     maintain-aspect-ratio=1

     [class-attrs-all]
     pre-cluster-threshold=0.1696
     nms-iou-threshold=0.5196
     minBoxes=2
     dbscan-min-score=1.4226
     eps=0.2280
     detected-min-w=20
     detected-min-h=20


.. raw:: html

   </details>

PeopleNet + NvDCF
~~~~~~~~~~~~~~~~~~~~~~~~~

This pipeline performs more accurate people tracking. For the output visualization, a ``deepstream-app`` pipeline is first constructed with the following components:

* **Detector**: PeopleNet v2.6 (w/ ResNet-34 as backbone)
* **Post-processing** algorithm for object detection: Hybrid clustering (i.e., DBSCAN + NMS)
* **Tracker**: NvDCF with ``config_tracker_NvDCF_accuracy.yml`` configuration

|

For better visualization, the following changes were also made:

* ``featureImgSizeLevel: 5`` is set under ``VisualTracker`` section in ``config_tracker_NvDCF_accuracy.yml``
* ``tracker-height=960`` and ``tracker-width=544`` under ``[tracker]`` section in the deepstream-app config file

|

The detector config file used is:


.. raw:: html

   <details>
   <summary><a>config_infer_primary_PeopleNet.txt</a></summary>

.. code-block:: bash

     [property]
     ## model-specific params like paths to model, engine, label files, etc. are to be added by users

     gpu-id=0
     net-scale-factor=0.0039215697906911373
     input-dims=3;544;960;0
     uff-input-blob-name=input_1
     process-mode=1
     model-color-format=0
     ## 0=FP32, 1=INT8, 2=FP16 mode
     network-mode=1
     num-detected-classes=3
     interval=0
     gie-unique-id=1
     output-blob-names=output_cov/Sigmoid;output_bbox/BiasAdd
     ## 1=DBSCAN, 2=NMS, 3= DBSCAN+NMS Hybrid, 4 = None(No clustering)
     cluster-mode=3
     maintain-aspect-ratio=1

     [class-attrs-all]
     pre-cluster-threshold=0.1037
     nms-iou-threshold=0.4842
     minBoxes=4
     dbscan-min-score=1.1845
     eps=0.3207
     detected-min-w=20
     detected-min-h=20


.. raw:: html

   </details>

|

The resulting output video of the aforementioned pipeline with (PeopleNet + Hybrid clustering + NvDCF) is shown below, but please note that only 'Person'-class objects are detected and shown in the video:

.. raw:: html

     <video width="1096" allow="autoplay" frameborder="0" controls loop>
          <source src="https://drive.google.com/uc?id=1sN0gFomxuC7a9IFXavrRHGtpchlGAO0Z&hd=1" type='video/mp4'>
     </video>

|

While the video above shows the `per-stream` output, each animated figure below shows (1) the cropped & scaled image patch used for `each target` on the left side and (2) the corresponding correlation response map for the target on the right side. As mentioned earlier, the yellow ``+`` mark shows the peak location of the correlation response map generated by using the learned correlation filter, while the puple ``x`` marks show the the center of nearby detector objects.


+---------------------------------------------------------------+---------------------------------------------------------------+
| **Person 1** (w/ Blue hat + gray backpack)                    | **Person 6** (w/ Red jacket + gray backpack)                  |
|                                                               |                                                               |
| .. image:: /content/NvDCF_PNv2.3_lvl5_reassoc_CorrResp[1].gif | .. image:: /content/NvDCF_PNv2.3_lvl5_reassoc_CorrResp[6].gif |
+---------------------------------------------------------------+---------------------------------------------------------------+
+---------------------------------------------------------------+---------------------------------------------------------------+
| **Person 4** (w/ Green jacket)                                | **Person 5** (w/ Cyan jacket)                                 |
|                                                               |                                                               |
| .. image:: /content/NvDCF_PNv2.3_lvl5_reassoc_CorrResp[4].gif | .. image:: /content/NvDCF_PNv2.3_lvl5_reassoc_CorrResp[5].gif |
+---------------------------------------------------------------+---------------------------------------------------------------+

The figures above show how the correlation responses progress over time for the cases of no occlusion, partial occlusion, and full occlusions happening. It can be seen that even when a target undergoes a full occlusion for a prolonged period, the NvDCF tracker is able to keep track of the targets in many cases.

If ``featureImgSizeLevel: 3`` is used instead for better performance, the resolution of the image patch used for each target would get lower like shown in the figure below.

+---------------------------------------------------------------+---------------------------------------------------------------+
| **Person 1** (w/ Blue hat + gray backpack)                    | **Person 6** (w/ Red jacket + gray backpack)                  |
|                                                               |                                                               |
| .. image:: /content/NvDCF_PNv2.3_lvl3_reassoc_CorrResp[1].gif | .. image:: /content/NvDCF_PNv2.3_lvl3_reassoc_CorrResp[6].gif |
+---------------------------------------------------------------+---------------------------------------------------------------+

Traffic Tracking
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

To setup a traffic tracking pipeline, users can download pre-trained `DetectNet_v2` model from `NVIDIA NGC catalog <https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/pretrained_detectnet_v2>`__, and also the one with ResNet-10 backbone is packaged as a part of DeepStream SDK release as well. It detects person, car, bicycle and road sign classes. The following samples demonstrate using DetectNet_v2 and NvDCF with different detection intervals for performance and accuracy tradeoff.

DetectNet_v2 + NvDCF
~~~~~~~~~~~~~~~~~~~~~~~~~

This sample performs detection at every frame. For the output visualization, a ``deepstream-app`` pipeline is first constructed with the following components:

* **Detector**: DetectNet_v2 (w/ ResNet-10 as backbone)
* **Post-processing** algorithm for object detection: Non-Maximum Suppression (NMS)
* **Tracker**: NvDCF with ``config_tracker_NvDCF_accuracy.yml`` configuration

|

For better visualization, the following changes were also made:

* ``featureImgSizeLevel: 5`` is set under ``VisualTracker`` section in ``config_tracker_NvDCF_accuracy.yml``
* ``tracker-height=960`` and ``tracker-width=544`` under ``[tracker]`` section in the deepstream-app config file

|

The detector config file used is:

.. raw:: html

   <details>
   <summary><a>config_infer_primary_DetectNet_v2.txt</a></summary>

.. code-block:: bash

     [property]
     ## model-specific params like paths to model, engine, label files, etc. are to be added by users

     gpu-id=0
     net-scale-factor=0.0039215697906911373
     process-mode=1
     model-color-format=0
     ## 0=FP32, 1=INT8, 2=FP16 mode
     network-mode=1
     num-detected-classes=4
     interval=0
     gie-unique-id=1
     output-blob-names=conv2d_bbox;conv2d_cov/Sigmoid
     force-implicit-batch-dim=1
     ## 1=DBSCAN, 2=NMS, 3= DBSCAN+NMS Hybrid, 4 = None(No clustering)
     cluster-mode=2

     [class-attrs-all]
     topk=25
     nms-iou-threshold=0.2
     pre-cluster-threshold=0.2

.. raw:: html

   </details>

|

Note that the neural net model used for this pipeline is much lighter than the PeopleNet used in the previous section, because ResNet-10 is used as the backbone of the DetectNet_v2 model for this pipeline. The resulting output video of the aforementioned pipeline with (DetectNet_v2 + NMS + NvDCF) is shown below:

.. raw:: html

     <video width="1096" allow="autoplay" frameborder="0" controls loop>
          <source src="https://drive.google.com/uc?id=1ueLC0_d2U2XHxXzg4CQemmEaE-qVaB0l&hd=1" type='video/mp4'>
     </video>

|

While the video above shows the `per-stream` output, each animated figure below shows (1) the cropped & scaled image patch used for `each target` on the left side and (2) the corresponding correlation response map for the target on the right side. Again, the yellow ``+`` mark shows the peak location of the correlation response map generated by using the learned correlation filter, while the puple ``x`` marks show the the center of nearby detector objects.

+---------------------------------------------------------------+---------------------------------------------------------------+
| **Car 40**                                                    | **Car 6**                                                     |
|                                                               |                                                               |
| .. image:: /content/NvDCF_RN10_lvl5_reassoc_CorrResp[40].gif  | .. image:: /content/NvDCF_RN10_lvl5_reassoc_CorrResp[6].gif   |
+---------------------------------------------------------------+---------------------------------------------------------------+
+---------------------------------------------------------------+---------------------------------------------------------------+
| **Car 54**                                                    | **Car 224**                                                   |
|                                                               |                                                               |
| .. image:: /content/NvDCF_RN10_lvl5_reassoc_CorrResp[54].gif  | .. image:: /content/NvDCF_RN10_lvl5_reassoc_CorrResp[224].gif |
+---------------------------------------------------------------+---------------------------------------------------------------+

Even when a target undergoes a full occlusion for a prolonged period or significant visual appearance changes over time due to the changing orientation of targets, the NvDCF tracker is able to keep track of the targets in many cases.

DetectNet_v2 (w/ interval=2) + NvDCF
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The enhanced robustness in NvDCF tracker allows the users to have a detection interval higher than ``0`` to improve the performance with minimum cost on the accuracy. This section presents a sample output from a pipeline with a PGIE module that is configured with ``interval=2``, meaning that the inference for object detection takes place at *every third* frame. The sample ``deepstream-app`` pipeline is constructed with the following configuration:

* **Detector**: DetectNet_v2 (w/ ResNet-10 as backbone) (w/ ``interval=2``)
* **Post-processing** algorithm for object detection: Non-Maximum Suppression (NMS)
* **Tracker**: NvDCF with ``config_tracker_NvDCF_accuracy.yml`` configuration

|

Below is the sample output of the pipeline:

.. raw:: html

     <video width="1096" allow="autoplay" frameborder="0" controls loop>
          <source src="https://drive.google.com/uc?id=1PHc_dSENJjSxMDO9eg4kEDhvfd_FjjIj&hd=1" type='video/mp4'>
     </video>

|

Note that with ``interval=2``, the computational load for the inferencing for object detection is only *a third* compared to that with ``interval=0``, dramatically improving the overall pipeline performance. If an accurate and robust object tracker is used, the accuracy of the overall pipeline wouldn't be degraded too much, potentially yielding a well-balanced tradeoff between performance and accuracy.


Low-Level Tracker Comparisons and Tradeoffs
------------------------------------------------------
DeepStream SDK provides three reference low-level tracker libraries which have different resource requirements and performance characteristics, in terms of accuracy, robustness, and efficiency, allowing the users to choose the best tracker based on their use cases and requirements. See the following table for comparison.

.. csv-table:: Gst-nvtracker Tracker library comparison
     :file: ../text/tables/Gst-nvtracker tables/DS_Plugin_gst-nvtracker_Tracker_library_comparison.csv
     :widths: 8, 8, 8, 30, 25, 25
     :header-rows: 1


How to Implement a Custom Low-Level Tracker Library
----------------------------------------------------------
To write a custom low-level tracker library, users are expected to implement the API defined in ``sources/includes/nvdstracker.h``, which is covered in an earlier section on `NvDsTracker API` , and parts of the API refer to ``sources/includes/nvbufsurface.h``. Thus, the users would need to include ``nvdstracker.h`` to implement the API:

   .. code-block:: c++

     #include "nvdstracker.h"

Below is a sample implementation of each API. First of all, the low-level tracker library needs to implement the query function from the plugin like below:

   .. code-block:: c++

     NvMOTStatus NvMOT_Query(uint16_t customConfigFilePathSize,
                              char* pCustomConfigFilePath,
                              NvMOTQuery *pQuery)
     {
          /**
           * Users can parse the low-level config file in pCustomConfigFilePath to check
           * the low-level tracker's requirements
           */

          pQuery->computeConfig = NVMOTCOMP_GPU;       // among {NVMOTCOMP_GPU, NVMOTCOMP_CPU}
          pQuery->numTransforms = 1;                   // 0 for IOU and NvSORT tracker, 1 for NvDCF or NvDeepSORT tracker as they require the video frames
          pQuery->colorFormats[0] = NVBUF_COLOR_FORMAT_NV12; // among {NVBUF_COLOR_FORMAT_NV12, NVBUF_COLOR_FORMAT_RGBA}

          // among {NVBUF_MEM_DEFAULT, NVBUF_MEM_CUDA_DEVICE, NVBUF_MEM_CUDA_UNIFIED, NVBUF_MEM_CUDA_PINNED, ... }
          #ifdef __aarch64__
               pQuery->memType = NVBUF_MEM_DEFAULT;
          #else
               pQuery->memType = NVBUF_MEM_CUDA_DEVICE;
          #endif

          pQuery->batchMode = NvMOTBatchMode_Batch;    // set NvMOTBatchMode_Batch if the low-level tracker supports batch processing mode. Otherwise, NvMOTBatchMode_NonBatch
          pQuery->supportPastFrame = true;             // set true if the low-level tracker supports the past-frame data or not

          /**
           * return NvMOTStatus_Error if something is wrong
           * return NvMOTStatus_OK if everything went well
           */
     }

Assuming that the low-level tracker library defines and implements a custom class (e.g., ``NvMOTContext`` class in the sample code below) to perform actual operations corresponding to each API call. Below is a sample code for initialization and de-initialization APIs:

   .. note::
        The sample code below have some skeletons only. Users are expected to add proper error handling and additional codes as needed


   .. code-block:: c++

     NvMOTStatus NvMOT_Init(NvMOTConfig *pConfigIn,
                              NvMOTContextHandle *pContextHandle,
                              NvMOTConfigResponse *pConfigResponse)
     {
          if(pContextHandle != nullptr)
          {
               NvMOT_DeInit(*pContextHandle);
          }

          /// User-defined class for the context
          NvMOTContext *pContext = nullptr;

          /// Instantiate the user-defined context
          pContext = new NvMOTContext(*pConfigIn, *pConfigResponse);

          /// Pass the pointer as the context handle
          *pContextHandle = pContext;

          /**
           * return NvMOTStatus_Error if something is wrong
           * return NvMOTStatus_OK if everything went well
           */
     }

     /**
      * This is a sample code for the constructor of `NvMOTContext`
      * to show what may need to happen when NvMOTContext is instantiated in the above code for `NvMOT_Init` API
      */
     NvMOTContext::NvMOTContext(const NvMOTConfig &config, NvMOTConfigResponse& configResponse)
     {
          // Set CUDA device as needed
          cudaSetDevice(m_Config.miscConfig.gpuId)

          // Instantiate an appropriate localizer/tracker implementation
          // Load and parse the config file for the low-level tracker using the path to a config file
          m_pLocalizer = LocalizerFactory::getInstance().makeLocalizer(config.customConfigFilePath);

          // Set max # of streams to be supported
          // ex) uint32_t maxStreams = config.maxStreams;

          // Use the video frame info
          for(uint i=0; i<m_Config.numTransforms; i++)
          {
               // Use the expected color format from the input source images
               NvBufSurfaceColorFormat configColorFormat = (NvBufSurfaceColorFormat)m_Config.perTransformBatchConfig[i].colorFormat;

               // Use the frame width, height, and pitch as needed
               uint32_t frameHeight = m_Config.perTransformBatchConfig[i].maxHeight;
               uint32_t frameWidth = m_Config.perTransformBatchConfig[i].maxWidth;
               uint32_t framePitch = m_Config.perTransformBatchConfig[i].maxPitch;

               /* Add here to pass the frame info to the low-level tracker */
          }

          // Set if everything goes well
          configResponse.summaryStatus = NvMOTConfigStatus_OK;

     }

   .. code-block:: c++

     void NvMOT_DeInit(NvMOTContextHandle contextHandle)
     {
          /// Destroy the context handle
          delete contextHandle;
     }

During the initialization stage (when ``NvMOT_Init()`` is called), the context for the low-level tracker is expected to be instantiated, and its pointer is passed as the context handle (i.e., ``pContextHandle``) as the output as well as the output status in ``pConfigResponse``. Users may allocate memories based on the information about the video frames (e.g., width, height, pitch, and colorFormat) and streams (e.g., max # of streams) from the input ``NvMOTConfig *pConfigIn``, where the definition of the struct ``NvMOTConfig`` can be found in ``nvdstracker.h``. The path to the config file for the low-level tracker library in ``pConfigIn->customConfigFilePath`` can be also used to parse the config file to initialize the low-level tracker library.

Once the low-level tracker library creates the tracker context during the initialization stage, it needs to implement a function to process each frame batch, which is ``NvMOT_Process()``. Make sure to set the stream ID properly in the output so that ``pParams->frameList[i].streamID`` matches with ``pTrackedObjectsBatch->list[j].streamID`` if they are for the same stream, regardless of ``i`` and ``j``. The method ``NvMOTContext::processFrame()`` in the sample code below is expected to perform the required multi-object tracking operations with the input data of the video frames and the detector object information, while reporting the tracking outputs in ``NvMOTTrackedObjBatch *pTrackedObjectsBatch``.

Users can refer to `Accessing NvBufSurface memory in OpenCV <https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_sample_custom_gstream.html#accessing-nvbufsurface-memory-in-opencv>`_ to know more about how to access the pixel data in the video frames.

   .. code-block:: c++

     NvMOTStatus NvMOT_Process(NvMOTContextHandle contextHandle,
                              NvMOTProcessParams *pParams,
                              NvMOTTrackedObjBatch *pTrackedObjectsBatch)
     {
          /// Process the given video frame using the user-defined method in the context, and generate outputs
          contextHandle->processFrame(pParams, pTrackedObjectsBatch);

          /**
           * return NvMOTStatus_Error if something is wrong
           * return NvMOTStatus_OK if everything went well
           */
     }

     /**
      * This is a sample code for the method of `NvMOTContext::processFrame()`
      * to show what may need to happen when it is called in the above code for `NvMOT_Process` API
      */
     NvMOTStatus NvMOTContext::processFrame(const NvMOTProcessParams *params,
                                             NvMOTTrackedObjBatch *pTrackedObjectsBatch)
     {
          // Make sure the input frame is valid according to the MOT Config used to create this context
          for(uint streamInd=0; streamInd<params->numFrames; streamInd++)
          {
               NvMOTFrame *motFrame = &params->frameList[streamInd];
               for(uint i=0; i<motFrame->numBuffers; i++)
               {
                    /* Add something here to check the validity of the input using the following info*/
                    motFrame->bufferList[i]->width
                    motFrame->bufferList[i]->height
                    motFrame->bufferList[i]->pitch
                    motFrame->bufferList[i]->colorFormat
               }
          }

          // Construct the mot input frames
          std::map<NvMOTStreamId, NvMOTFrame*> nvFramesInBatch;
          for(NvMOTStreamId streamInd=0; streamInd<params->numFrames; streamInd++)
          {
               NvMOTFrame *motFrame = &params->frameList[streamInd];
               nvFramesInBatch[motFrame->streamID] = motFrame;
          }

          if(nvFramesInBatch.size() > 0)
          {
               // Perform update and construct the output data inside
               m_pLocalizer->update(nvFramesInBatch, pTrackedObjectsBatch);

               /**
                * The call m_pLocalizer->update() is expected to properly populate the ouput (i.e., `pTrackedObjectsBatch`).
                *
                * One thing to not forget is to fill `pTrackedObjectsBatch->list[i].list[j].associatedObjectIn`, where
                * `i` and `j` are indices for stream and targets in the list, respectively.
                * If the `j`th target was associated/matched with a detector object,
                * then `associatedObjectIn` is supposed to have the pointer to the associated detector object.
                * Otherwise, `associatedObjectIn` shall be set NULL.
                */
          }
     }

In case the low-level tracker has a capability of storing the past-frame data, it can be retrieved to the tracker plugin by using the ``NvMOT_ProcessPast()`` API call.

   .. code-block:: c++

     NvMOTStatus NvMOT_ProcessPast(NvMOTContextHandle contextHandle,
                                   NvMOTProcessParams *pParams,
                                   NvDsPastFrameObjBatch *pPastFrameObjBatch)
     {
          /// Retrieve the past-frame data if there are
          contextHandle->processFramePast(pParams, pPastFrameObjBatch);

          /**
           * return NvMOTStatus_Error if something is wrong
           * return NvMOTStatus_OK if everything went well
           */
     }

     /**
      * This is a sample code for the method of `NvMOTContext::processFramePast()`
      * to show what may need to happen when it is called in the above code for `NvMOT_ProcessPast` API
      */
     NvMOTStatus NvMOTContext::processFramePast(const NvMOTProcessParams *params,
                                                  NvDsPastFrameObjBatch *pPastFrameObjBatch)
     {
          std::set<NvMOTStreamId> videoStreamIdList;

          ///\ Indiate what streams we want to fetch past-frame data
          for(NvMOTStreamId streamInd=0; streamInd<params->numFrames; streamInd++)
          {
               videoStreamIdList.insert(params->frameList[streamInd].streamID);
          }

          m_pLocalizer->outputPastFrameObjs(videoStreamIdList, pPastFrameObjBatch);
     }

For the cases where the video stream sources are dynamically removed and added, the API call ``NvMOT_RemoveStreams()`` can be implemented to clean-up the resources no longer needed.

   .. code-block:: c++

     NvMOTStatus NvMOT_RemoveStreams(NvMOTContextHandle contextHandle,
                                        NvMOTStreamId streamIdMask)
     {
          /// Remove the specified video stream from the low-level tracker context
          contextHandle->removeStream(streamIdMask);

          /**
           * return NvMOTStatus_Error if something is wrong
           * return NvMOTStatus_OK if everything went well
           */
     }

     /**
      * This is a sample code for the method of `NvMOTContext::removeStream()`
      * to show what may need to happen when it is called in the above code for `NvMOT_RemoveStreams` API
      */
     NvMOTStatus NvMOTContext::removeStream(const NvMOTStreamId streamIdMask)
     {
          m_pLocalizer->deleteRemovedStreamTrackers(streamIdMask);
     }

In sum, to work with the `NvDsTracker` APIs, users may want to define ``class NvMOTContext`` like below to implement the methods in the code above. The actual implementation of each method may differ depending on the tracking algorithm the user choose to implement.

   .. code-block:: c++

     /**
     * @brief Context for input video streams
     *
     * The stream context holds all necessary state to perform multi-object tracking
     * within the stream.
     *
     */
     class NvMOTContext
     {
     public:
          NvMOTContext(const NvMOTConfig &configIn, NvMOTConfigResponse& configResponse);
          ~NvMOTContext();

          /**
           * @brief Process a batch of frames
           *
           * Internal implementation of NvMOT_Process()
           *
           * @param [in] pParam Pointer to parameters for the frame to be processed
           * @param [out] pTrackedObjectsBatch Pointer to object tracks output
           */
          NvMOTStatus processFrame(const NvMOTProcessParams *params,
                                        NvMOTTrackedObjBatch *pTrackedObjectsBatch);
          /**
           * @brief Output the past-frame data if there are
           *
           *  Internal implementation of NvMOT_ProcessPast()
           *
           * @param [in] pParam Pointer to parameters for the frame to be processed
           * @param [out] pPastFrameObjectsBatch Pointer to past frame object tracks output
           */
          NvMOTStatus processFramePast(const NvMOTProcessParams *params,
                                        NvDsPastFrameObjBatch *pPastFrameObjectsBatch);
          /**
           * @brief Terminate trackers and release resources for a stream when the stream is removed
           *
           *  Internal implementation of NvMOT_RemoveStreams()
           *
           * @param [in] streamIdMask removed stream ID
           */
          NvMOTStatus removeStream(const NvMOTStreamId streamIdMask);

     protected:

          /**
           * Users can include an actual tracker implementation here as a member
           * `IMultiObjectTracker` can be assumed to an user-defined interface class
           */
           std::shared_ptr<IMultiObjectTracker> m_pLocalizer;

     };


.. |beginfigref| raw:: latex

                     \begin{minipage}{\textwidth}


.. |endfigref| raw:: latex

                   \end{minipage}