Overview

The Kanade-Lucas-Tomasi (KLT) Feature Tracker algorithm estimates the 2D translation and scale changes of an image template between original template coordinates and a given reference image using the Inverse Compositional algorithm. For more information, see [1].

Inputs are an array of template bounding boxes, a translation and scale changes predictions array and a reference image. Additionally, a template image input is used to update template patches (see details below).

Outputs are the translation and scale changes estimation array from the input bounding box coordinates to the reference image coordinates and the template bounding box coordinates array in the reference image.

Tracking Result

Note: Video output requires HTML5-capable browser that supports H.264 mp4 video decoding.

Implementation

Each template bounding box defines a template image patch stored internally with the function descriptor. These template patches are tracked in reference images based on predicted translation and scale changes. An estimated translation and scale change from the original bounding box coordinates to reference image coordinates is computed. Each such estimation includes a tracking validity flag (tracking success or failure) and whether a template update is required, based on user-defined threshold parameters.

C API functions

For list of limitations, constraints and backends that implements the algorithm, consult reference documentation of the following functions:

Function	Description
vpiInitKLTFeatureTrackerCreationParams	Initialize VPIKLTFeatureTrackerCreationParams with default values.
vpiCreateKLTFeatureTracker	Creates payload for vpiSubmitKLTFeatureTracker.
vpiSubmitKLTFeatureTracker	Runs KLT Feature Tracker on two frames.

Usage

Language: C++ Python

Import VPI module
import vpi
(optional) Define a function to update the input bounding boxes and predictions for the next tracking given the output bounding boxes and estimations from the previous tracking. This custom update is similar to the default update, added here as an example on how to define a custom update.
# This function reads the results of the KLT tracker (output bounding boxes and estimations) and updates the input bounding boxes and predictions for the next iteration.

def customUpdate(inBoxes, inPreds, outBoxes, outEstim):

with inBoxes.lock_cpu() as inBoxes_cpu, inPreds.lock_cpu() as inPreds_cpu, \

outBoxes.rlock_cpu() as outBoxes_cpu, outEstim.rlock_cpu() as outEstim_cpu:

inBoxes_ = inBoxes_cpu.view(np.recarray)

inPreds_ = inPreds_cpu.view(np.recarray)

outBoxes_ = outBoxes_cpu.view(np.recarray)

outEstim_ = outEstim_cpu.view(np.recarray)

for i in range(outBoxes.size):

# If the track status of a bounding box is lost, it assigns lost to the corresponding input bouding box.

if outBoxes_[i].tracking_status == vpi.KLTTrackStatus.LOST:

inBoxes_[i].tracking_status = vpi.KLTTrackStatus.LOST

continue

# If the template status is update needed, it updates the input bounding box with the corresponding output, making its prediction the identity matrix (fixing the bounding box).

if outBoxes_[i].template_status == vpi.KLTTemplateStatus.UPDATE_NEEDED:

inBoxes_[i] = outBoxes_[i]

inPreds_[i] = np.eye(3)

else:

# If the update is not needed, just update the input prediction by the corresponding output estimation.

inBoxes_[i].template_status = vpi.KLTTemplateStatus.UPDATE_NOT_NEEDED

inPreds_[i] = outEstim_[i]
Convert an input frame from OpenCV to a gray-scale OpenCV frame and a VPI image with the image data. Both are used in processing the input and output video frames.
def convertFrameImage(inputFrame):

if inputFrame.ndim == 3 and inputFrame.shape[2] == 3:

grayFrame = cv2.cvtColor(inputFrame, cv2.COLOR_BGR2GRAY)

else:

grayFrame = inputFrame

grayImage = vpi.asimage(grayFrame.copy())

return grayFrame, grayImage
Define the input bounding boxes as a VPI Array of type vpi.Type.KLT_TRACKED_BOUNDING_BOX. Its capacity is the total number of bounding boxes to be tracked in all video frames. This is done to guarantee the maximum storage in case all bounding boxes throughout the whole video are tracked.
inBoxes = vpi.Array(totalNumBoxes, vpi.Type.KLT_TRACKED_BOUNDING_BOX)
Read the first input frame from the input video and convert it to gray-scale OpenCV frame and VPI image.
validFrame, cvFrame = inVideo.read()

if not validFrame:

print("Error reading first input frame", file=sys.stderr)

exit(1)

# Convert OpenCV frame to gray returning also the VPI image

cvGray, imgTemplate = convertFrameImage(cvFrame)
Create the VPI KLTFeatureTracker object that will contain all the information needed by the algorithm. One such information is the input predictions, that can be retrieved from it via in_predictions() for other processing. The constructor receives the image template, the input bounding boxes, and the backend to execute the algorithm. It is assumed that all input frames have the same size, thus the image template may be the first frame.
klt = vpi.KLTFeatureTracker(imgTemplate, inBoxes, backend=vpi.Backend.CUDA)

inPreds = klt.in_predictions()
At each valid frame, first check if the current frame is in allBoxes, a dictionary mapping frame indices to a list of input bounding boxes to start tracking at that frame. If yes, add these bounding boxes to the klt object.
while validFrame:

if curFrame in allBoxes:

klt.add_boxes(allBoxes[curFrame])
Read the next input frame from the input video and convert it to gray-scale OpenCV frame and VPI image reference.
curFrame += 1

validFrame, cvFrame = inVideo.read()

if not validFrame:

break

cvGray, imgReference = convertFrameImage(cvFrame)
Then execute the algorithm on the input image frame using the CUDA backend, defined in the klt creation. The update function passed is the one defined in the beginning. When this argument is not present, the klt call runs the default update. The output bounding boxes are returned with the tracked information.
outBoxes = klt(imgReference, update=customUpdate)

Initialization phase
1. Include the header that defines the needed functions and structures.
  #include <vpi/algo/KLTFeatureTracker.h>
  
  KLTFeatureTracker.h
  Declares functions that implement the KLT Feature Tracker algorithm.
2. Define the input frames and input bounding boxes. Refer to VPIBoundingBox documentation for instructions on how to properly fill each bounding box given an axis-aligned bounding box, the reference frames, the input boxes and input predictions.
  int frame_count = /*... */;
  
  VPIImage *frames = /* ... */;
  
  int bbox_count = /* ... */;
  
  VPIBoundingBox *bboxes = /* ... */;
  
  VPIImage
  struct VPIImageImpl * VPIImage
  A handle to an image.
  Definition: Types.h:256
  
  VPIBoundingBox
  Stores a generic 2D bounding box.
  Definition: Types.h:424
3. Create the bounding box array with tracking information. For new bounding boxes, trackingStatus must be 0, indicating that bounding box tracking is valid. templateStatus must be 1, indicating that the template corresponding to this bounding box must be updated.
  VPIKLTTrackedBoundingBox tracked_bboxes[128];
  
  int b;
  
  for (b = 0; b < bbox_count; ++b)
  
  {
  
  tracked_bboxes[b].bbox = bboxes[b];
  
  tracked_bboxes[b].trackingStatus = 0; /* valid tracking */
  
  tracked_bboxes[b].templateStatus = 1; /* must update */
  
  }
  
  VPIKLTTrackedBoundingBox::templateStatus
  int8_t templateStatus
  Status of the template related to this bounding box.
  Definition: Types.h:504
  
  VPIKLTTrackedBoundingBox::trackingStatus
  int8_t trackingStatus
  Tracking status of this bounding box.
  Definition: Types.h:497
  
  VPIKLTTrackedBoundingBox::bbox
  VPIBoundingBox bbox
  Bounding box being tracked.
  Definition: Types.h:490
  
  VPIKLTTrackedBoundingBox
  Stores a bounding box that is being tracked by KLT Tracker.
  Definition: Types.h:488
4. Wrap the tracked bounding box into a VPIArray. The array type must be VPI_ARRAY_TYPE_KLT_TRACKED_BOUNDING_BOX
  
  VPIArrayData data_bboxes;
  
  memset(&data_bboxes, 0, sizeof(data_bboxes));
  
  data_bboxes.bufferType = VPI_ARRAY_BUFFER_HOST_AOS;
  
  data_bboxes.buffer.aos.type = VPI_ARRAY_TYPE_KLT_TRACKED_BOUNDING_BOX;
  
  data_bboxes.buffer.aos.capacity = 128;
  
  data_bboxes.buffer.aos.sizePointer = &bbox_count;
  
  data_bboxes.buffer.aos.data = tracked_bboxes;
  
  VPIArray inputBoxList;
  
  vpiArrayCreateWrapper(&data_bboxes, 0, &inputBoxList);
  
  VPIArrayData::bufferType
  VPIArrayBufferType bufferType
  Type of array buffer.
  Definition: Array.h:172
  
  VPIArrayBufferAOS::data
  void * data
  Points to the first element of the array.
  Definition: Array.h:135
  
  VPIArrayData::buffer
  VPIArrayBuffer buffer
  Stores the array contents.
  Definition: Array.h:175
  
  VPIArrayBufferAOS::sizePointer
  int32_t * sizePointer
  Points to the number of elements in the array.
  Definition: Array.h:122
  
  VPIArrayBuffer::aos
  VPIArrayBufferAOS aos
  Array stored in array-of-structures layout.
  Definition: Array.h:162
  
  VPIArrayBufferAOS::capacity
  int32_t capacity
  Maximum number of elements that the array can hold.
  Definition: Array.h:126
  
  VPIArrayBufferAOS::type
  VPIArrayType type
  Type of each array element.
  Definition: Array.h:118
  
  vpiArrayCreateWrapper
  VPIStatus vpiArrayCreateWrapper(const VPIArrayData *data, uint64_t flags, VPIArray *array)
  Create an array object by wrapping an existing host memory block.
  
  VPIArray
  struct VPIArrayImpl * VPIArray
  A handle to an array.
  Definition: Types.h:232
  
  VPI_ARRAY_TYPE_KLT_TRACKED_BOUNDING_BOX
  @ VPI_ARRAY_TYPE_KLT_TRACKED_BOUNDING_BOX
  VPIKLTTrackedBoundingBox element.
  Definition: ArrayType.h:79
  
  VPI_ARRAY_BUFFER_HOST_AOS
  @ VPI_ARRAY_BUFFER_HOST_AOS
  Host-accessible array-of-structures.
  Definition: Array.h:146
  
  VPIArrayData
  Stores information about array characteristics and contents.
  Definition: Array.h:168
5. Create the bounding box transformation prediction array, initially filled with identity transforms, since the template matches exactly the bounding box contents in the template image.
  VPIHomographyTransform2D preds[128];
  
  int i;
  
  for (i = 0; i < bbox_count; ++i)
  
  {
  
  VPIHomographyTransform2D *xform = preds + i;
  
  /* Identity transform. */
  
  memset(xform, 0, sizeof(*xform));
  
  xform->mat3[0][0] = 1;
  
  xform->mat3[1][1] = 1;
  
  xform->mat3[2][2] = 1;
  
  }
  
  VPIHomographyTransform2D::mat3
  float mat3[3][3]
  3x3 homogeneous matrix that defines the homography.
  Definition: Types.h:405
  
  VPIHomographyTransform2D
  Stores a generic 2D homography transform.
  Definition: Types.h:404
6. Wrap this array into a VPIArray. The array type must be VPI_ARRAY_TYPE_HOMOGRAPHY_TRANSFORM_2D.
  VPIArrayData data_preds;
  
  memset(&data_preds, 0, sizeof(data_preds));
  
  data_preds.bufferType = VPI_ARRAY_BUFFER_HOST_AOS;
  
  data_preds.buffer.aos.type = VPI_ARRAY_TYPE_HOMOGRAPHY_TRANSFORM_2D;
  
  data_preds.buffer.aos.capacity = 128;
  
  int32_t data_preds_size = bbox_count;
  
  data_preds.buffer.aos.sizePointer = &data_preds_size;
  
  data_preds.buffer.aos.data = preds;
  
  VPIArray inputPredList;
  
  vpiArrayCreateWrapper(&data_preds, 0, &inputPredList);
  
  VPI_ARRAY_TYPE_HOMOGRAPHY_TRANSFORM_2D
  @ VPI_ARRAY_TYPE_HOMOGRAPHY_TRANSFORM_2D
  VPIHomographyTransform2D element.
  Definition: ArrayType.h:78
7. Create the payload that will contain all temporary buffers needed for processing. It is assumed that all input frames have the same size, so the first frame dimensions and type are used to create the payload.
  VPIImageFormat imgFormat;
  
  vpiImageGetFormat(frames[0], &imgFormat);
  
  int width, height;
  
  vpiImageGetSize(frames[0], &width, &height);
  
  VPIPayload klt;
  
  vpiCreateKLTFeatureTracker(VPI_BACKEND_CUDA, width, height, imgFormat, NULL, &klt);
  
  VPIImageFormat
  uint64_t VPIImageFormat
  Pre-defined image formats.
  Definition: ImageFormat.h:94
  
  vpiImageGetFormat
  VPIStatus vpiImageGetFormat(VPIImage img, VPIImageFormat *format)
  Get the image format.
  
  vpiImageGetSize
  VPIStatus vpiImageGetSize(VPIImage img, int32_t *width, int32_t *height)
  Get the image dimensions in pixels.
  
  vpiCreateKLTFeatureTracker
  VPIStatus vpiCreateKLTFeatureTracker(uint64_t backends, int32_t imageWidth, int32_t imageHeight, VPIImageFormat imageFormat, const VPIKLTFeatureTrackerCreationParams *params, VPIPayload *payload)
  Creates payload for vpiSubmitKLTFeatureTracker.
  
  VPIPayload
  struct VPIPayloadImpl * VPIPayload
  A handle to an algorithm payload.
  Definition: Types.h:268
  
  VPI_BACKEND_CUDA
  @ VPI_BACKEND_CUDA
  CUDA backend.
  Definition: Types.h:93
8. Define the configuration parameters that guide the KLT tracking process.
  VPIKLTFeatureTrackerParams params;
  
  vpiInitKLTFeatureTrackerParams(&params);
  
  vpiInitKLTFeatureTrackerParams
  VPIStatus vpiInitKLTFeatureTrackerParams(VPIKLTFeatureTrackerParams *params)
  Initialize VPIKLTFeatureTrackerParams with default values.
  
  VPIKLTFeatureTrackerParams
  Structure that defines the parameters for vpiCreateKLTFeatureTracker.
  Definition: KLTFeatureTracker.h:174
9. Create the output tracked bounding box array. It will contain the estimated current frame's bounding box based on previous frame and the template information gathered so far. It also contains the bounding box current tracking status.
  VPIArray outputBoxList;
  
  vpiArrayCreate(128, VPI_ARRAY_TYPE_KLT_TRACKED_BOUNDING_BOX, 0, &outputBoxList);
  
  vpiArrayCreate
  VPIStatus vpiArrayCreate(int32_t capacity, VPIArrayType type, uint64_t flags, VPIArray *array)
  Create an empty array instance.
10. Create the output estimated transforms. It will contain the transform that makes the bounding box template match the corresponding bounding box on the current (reference) frame.
  VPIArray outputEstimList;
  
  vpiArrayCreate(128, VPI_ARRAY_TYPE_HOMOGRAPHY_TRANSFORM_2D, 0, &outputEstimList);
11. Create the stream where the algorithm will be submitted for execution.
  VPIStream stream;
  
  vpiStreamCreate(0, &stream);
  
  VPIStream
  struct VPIStreamImpl * VPIStream
  A handle to a stream.
  Definition: Types.h:250
  
  vpiStreamCreate
  VPIStatus vpiStreamCreate(uint64_t flags, VPIStream *stream)
  Create a stream instance.
Processing phase
1. Start of the processing loop from the second frame. The previous frame is where the algorithm fetches the tracked templates from, the current frame is where these templates are matched against.
  for (int idframe = 1; idframe < frame_count; ++idframe)
  
  {
  
  VPIImage imgTemplate = frames[idframe - 1];
  
  VPIImage imgReference = frames[idframe];
2. Submit the algorithm. The first time it's run, it will go through all input bounding boxes, crop them from the template frame and store them in the payload. Subsequent runs will either repeat the cropping and storing process for new bounding boxes added (doesn't happen in this example, but happens in the sample application), or perform the template matching on the reference frame.
  VPI_CHECK_STATUS(vpiSubmitKLTFeatureTracker(stream, VPI_BACKEND_CUDA, klt, imgTemplate, inputBoxList,
  
  inputPredList, imgReference, outputBoxList, outputEstimList,
  
  &params));
  
  vpiSubmitKLTFeatureTracker
  VPIStatus vpiSubmitKLTFeatureTracker(VPIStream stream, uint64_t backend, VPIPayload payload, VPIImage templateImage, VPIArray inputBoxList, VPIArray inputPredictionList, VPIImage referenceImage, VPIArray outputBoxList, VPIArray outputEstimationList, const VPIKLTFeatureTrackerParams *params)
  Runs KLT Feature Tracker on two frames.
3. Wait until the processing is done.
  vpiStreamSync(stream);
  
  vpiStreamSync
  VPIStatus vpiStreamSync(VPIStream stream)
  Blocks the calling thread until all submitted commands in this stream queue are done (queue is empty)...
4. Lock the output arrays to retrieve the updated bounding boxes and the estimated transforms.
  VPIArrayData updatedBBoxData;
  
  vpiArrayLockData(outputBoxList, VPI_LOCK_READ, VPI_ARRAY_BUFFER_HOST_AOS, &updatedBBoxData);
  
  VPIArrayData estimData;
  
  vpiArrayLockData(outputEstimList, VPI_LOCK_READ, VPI_ARRAY_BUFFER_HOST_AOS, &estimData);
  
  VPIKLTTrackedBoundingBox *updated_bbox = (VPIKLTTrackedBoundingBox *)updatedBBoxData.buffer.aos.data;
  
  VPIHomographyTransform2D *estim = (VPIHomographyTransform2D *)estimData.buffer.aos.data;
  
  vpiArrayLockData
  VPIStatus vpiArrayLockData(VPIArray array, VPILockMode mode, VPIArrayBufferType bufType, VPIArrayData *data)
  Acquires the lock on an array object and returns the array contents.
  
  VPI_LOCK_READ
  @ VPI_LOCK_READ
  Lock memory only for reading.
  Definition: Types.h:617
5. Lock the input arrays so that their state for the next iteration can be updated. Since they are actually wrappers, the wrapped data will be updated directly. In order to do that, the corresponding VPI array must be locked for writing.
  vpiArrayLock(inputBoxList, VPI_LOCK_READ_WRITE);
  
  vpiArrayLock(inputPredList, VPI_LOCK_READ_WRITE);
  
  vpiArrayLock
  VPIStatus vpiArrayLock(VPIArray array, VPILockMode mode)
  Acquires the lock on an array object.
  
  VPI_LOCK_READ_WRITE
  @ VPI_LOCK_READ_WRITE
  Lock memory for reading and writing.
  Definition: Types.h:631
6. Loop through all bounding boxes.
  int b;
  
  for (b = 0; b < bbox_count; ++b)
  
  {
7. Update bounding box statuses. If tracking was lost (trackingStatus==1), the input bounding box must also be marked as such, so subsequent KLT iterations ignore it. If the template needs to be updated (templateStatus==1), the next iteration will do the updating, or else it will perform the template matching.
  tracked_bboxes[b].trackingStatus = updated_bbox[b].trackingStatus;
  
  tracked_bboxes[b].templateStatus = updated_bbox[b].templateStatus;
8. Skip bounding boxes that aren't being tracked.
  if (updated_bbox[b].trackingStatus)
  
  {
  
  continue;
  
  }
9. If template for this bounding box must be updated in next KLT iteration, the user must re-define the bounding box. There are several ways to do it. One can use a feature detector such as Harris keypoint detector to help fetch a brand-new bounding box, use updated_bbox[b] and either refine it through other means to avoid accumulating tracking errors, or simply use it as-is, which is a less robust approach, but still yields decent results. This example chooses this last, simpler approach.
  if (updated_bbox[b].templateStatus)
  
  {
  
  tracked_bboxes[b] = updated_bbox[b];
10. Also reset the corresponding input predicted transforms, setting it to identity, as it's now assumed that the input bounding box matches exactly the object being tracked.
  memset(&preds[b], 0, sizeof(preds[b]));
  
  preds[b].mat3[0][0] = 1;
  
  preds[b].mat3[1][1] = 1;
  
  preds[b].mat3[2][2] = 1;
  
  }
11. If the template doesn't need to be updated, set the input predicted transform to the one estimated by this KLT iteration.
  else
  
  {
  
  preds[b] = estim[b];
  
  }
  
  }
12. Once all bounding boxes are updated, unlock the input and output arrays as they aren't needed by this iteration anymore.
  vpiArrayUnlock(inputBoxList);
  
  vpiArrayUnlock(inputPredList);
  
  vpiArrayUnlock(outputBoxList);
  
  vpiArrayUnlock(outputEstimList);
  
  vpiArrayUnlock
  VPIStatus vpiArrayUnlock(VPIArray array)
  Releases the lock on array object.
13. End of this iteration.
  }
Cleanup phase
1. Free resources held by the stream, the payload, and the input and output arrays.
  vpiStreamDestroy(stream);
  
  vpiPayloadDestroy(klt);
  
  vpiArrayDestroy(inputBoxList);
  
  vpiArrayDestroy(inputPredList);
  
  vpiArrayDestroy(outputBoxList);
  
  vpiArrayDestroy(outputEstimList);
  
  vpiArrayDestroy
  void vpiArrayDestroy(VPIArray array)
  Destroy an array instance.
  
  vpiPayloadDestroy
  void vpiPayloadDestroy(VPIPayload payload)
  Deallocates the payload object and all associated resources.
  
  vpiStreamDestroy
  void vpiStreamDestroy(VPIStream stream)
  Destroy a stream instance and deallocate all HW resources.

For more information, see KLT Feature Tracker in the "C API Reference" section of VPI - Vision Programming Interface.

References

Simon Baker, Iain Matthews, "Lucas-Kanade 20 Years On: A Unified Framework".
International Journal of Computer Vision, February 2004, Volume 56, issue 3, pp 221-255.

VPI - Vision Programming Interface

3.2 Release

Overview

Implementation

C API functions

Usage

References