The Kanade-Lucas-Tomasi (KLT) Feature Tracker algorithm estimates the 2D translation and scale changes of an image template between original template coordinates and a given reference image using the Inverse Compositional algorithm. For more information, see [1].
Inputs are an array of template bounding boxes, a translation and scale changes predictions array and a reference image. Additionally, a template image input is used to update template patches (see details below).
Outputs are the translation and scale changes estimation array from the input bounding box coordinates to the reference image coordinates and the template bounding box coordinates array in the reference image.
Tracking Result
Note
Video output requires HTML5-capable browser that supports H.264 mp4 video decoding.
Implementation
Each template bounding box defines a template image patch stored internally with the function descriptor. These template patches are tracked in reference images based on predicted translation and scale changes. An estimated translation and scale change from the original bounding box coordinates to reference image coordinates is computed. Each such estimation includes a tracking validity flag (tracking success or failure) and whether a template update is required, based on user-defined threshold parameters.
Usage
Note
Due to PVA restrictions, the created VPI arrays' capacity must be 128.
The algorithm is currently only available via C API. It'll be accessible via Python in a future VPI release.
Language:
Initialization phase
Include the header that defines the needed functions and structures.
Declares functions that implement the KLT Feature Tracker algorithm.
Define the input frames and input bounding boxes. Refer to VPIBoundingBox documentation for instructions on how to properly fill each bounding box given an axis-aligned bounding box, the reference frames, the input boxes and input predictions.
Create the bounding box array with tracking information. For new bounding boxes, trackingStatus must be 0, indicating that bounding box tracking is valid. templateStatus must be 1, indicating that the template corresponding to this bounding box must be updated.
Create an array object by wrapping an existing host memory block.
Create the bounding box transformation prediction array, initially filled with identity transforms, since the template matches exactly the bounding box contents in the template image.
Create the payload that will contain all temporary buffers needed for processing. It is assumed that all input frames have the same size, so the first frame dimensions and type are used to create the payload.
Create the output tracked bounding box array. It will contain the estimated current frame's bounding box based on previous frame and the template information gathered so far. It also contains the bounding box current tracking status.
Create the output estimated transforms. It will contain the transform that makes the bounding box template match the corresponding bounding box on the current (reference) frame.
Start of the processing loop from the second frame. The previous frame is where the algorithm fetches the tracked templates from, the current frame is where these templates are matched against.
for (int idframe = 1; idframe < frame_count; ++idframe)
Submit the algorithm. The first time it's run, it will go through all input bounding boxes, crop them from the template frame and store them in the payload. Subsequent runs will either repeat the cropping and storing process for new bounding boxes added (doesn't happen in this example, but happens in the sample application), or perform the template matching on the reference frame.
Update bounding box statuses. If tracking was lost (trackingStatus==1), the input bounding box must also be marked as such, so subsequent KLT iterations ignore it. If the template needs to be updated (templateStatus==1), the next iteration will do the updating, or else it will perform the template matching.
If template for this bounding box must be updated in next KLT iteration, the user must re-define the bounding box. There are several ways to do it. One can use a feature detector such as Harris keypoint detector to help fetch a brand-new bounding box, use updated_bbox[b] and either refine it through other means to avoid accumulating tracking errors, or simply use it as-is, which is a less robust approach, but still yields decent results. This example chooses this last, simpler approach.
if (updated_bbox[b].templateStatus)
{
tracked_bboxes[b] = updated_bbox[b];
Also reset the corresponding input predicted transforms, setting it to identity, as it's now assumed that the input bounding box matches exactly the object being tracked.
Since the input arrays content has been modified externally, invalidate them so that VPI discards the contents of any copies it might have made internally.
Bounding box sizes must be between 4x4 and 128x128
PVA
Only available on Jetson Xavier devices.
Input images' dimensions must be between 65x65 and 3264x2448.
Maximum scale change is 0.2.
Minimum input and output array capacities is 128.
Maximum number of bounding boxes is 64.
Maximum numberOfIterationsScaling is 20.
Bounding box sizes must be between 4x4 and 64x64
Only accepts VPI_IMAGE_FORMAT_U16 inputs whose pixel values' range is between 0 and 255.
VIC
Not implemented.
References
Simon Baker, Iain Matthews, "Lucas-Kanade 20 Years On: A Unified Framework".
International Journal of Computer Vision, February 2004, Volume 56, issue 3, pp 221-255.