VPI - Vision Programming Interface

3.2 Release

KLT Bounding Box Tracker

Overview

This application tracks bounding boxes on an input video, draws them on each frame and saves the result in video file. You can define what backend will be used for processing.

Note
The output will be in grayscale as the algorithm currently doesn't support color inputs.

Instructions

The command line parameters are:

<backend> <input video> <input bboxes>

where

  • backend: either cpu or cuda; it defines the backend that will perform the processing.
  • input video: input video file name, it accepts all video types that OpenCV's cv::VideoCapture accepts.
  • input bboxes: file with input bounding boxes and in what frame they appear. The file is composed of multiple lines with the following format:
       <frame> <bbox_x> <bbox_y> <bbox_width> <bbox_height>
    It's important that the lines are sorted with frames in ascending order.

Here's one example:

  • C++
    ./vpi_sample_06_klt_tracker cuda ../assets/dashcam.mp4 ../assets/dashcam_bboxes.txt
  • Python
    python3 main.py cuda ../assets/dashcam.mp4 ../assets/dashcam_bboxes.txt
    This is using the CUDA backend and one of the provided sample videos and bounding boxes. It'll render the tracked bounding boxes into klt_cuda.mp4.

Results

Tracking Result
Note
Video output requires HTML5-capable browser that supports H.264 mp4 video decoding.

Source Code

For convenience, here's the code that is also installed in the samples directory.

Language:
28 from __future__ import print_function
29 
30 import sys
31 from argparse import ArgumentParser
32 import numpy as np
33 import vpi
34 import cv2
35 
36 
37 # Convert a colored input frame to grayscale (if needed)
38 # and then, if using PVA backend, convert it to 16-bit unsigned pixels;
39 # The converted frame is copied before wrapping it as a VPI image so
40 # later draws in the gray frame do not change the reference VPI image.
41 def convertFrameImage(inputFrame, backend):
42  if inputFrame.ndim == 3 and inputFrame.shape[2] == 3:
43  grayFrame = cv2.cvtColor(inputFrame, cv2.COLOR_BGR2GRAY)
44  else:
45  grayFrame = inputFrame
46  if backend == vpi.Backend.PVA:
47  # PVA only supports 16-bit unsigned inputs,
48  # where each element is in 0-255 range, so
49  # no rescaling is needed.
50  grayFrame = grayFrame.astype(np.uint16)
51  grayImage = vpi.asimage(grayFrame.copy())
52  return grayFrame, grayImage
53 
54 
55 # Write the input gray frame to output video with
56 # input bounding boxes and predictions
57 def writeOutput(outVideo, cvGray, inBoxes, inPreds, colors, backend):
58  try:
59  if cvGray.dtype == np.uint16:
60  cvGray = cvGray.astype(np.uint8)
61  if cvGray.dtype != np.uint8:
62  raise Exception('Input frame format must be grayscale, 8-bit unsigned')
63  cvGrayBGR = cv2.cvtColor(cvGray, cv2.COLOR_GRAY2BGR)
64 
65  # Tracking the number of valid bounding boxes in the current frame
66  numValidBoxes = 0
67 
68  # Draw the input bounding boxes considering the input predictions
69  with inBoxes.rlock_cpu(), inPreds.rlock_cpu() as pred:
70  # Array of bounding boxes (bbox) and predictions (pred)
71  bbox = inBoxes.cpu().view(np.recarray)
72 
73  for i in range(inBoxes.size):
74  if bbox[i].tracking_status == vpi.KLTTrackStatus.LOST:
75  # If the tracking status of the current bounding box is lost, skip it
76  continue
77 
78  # Gather information of the current (i) bounding box and prediction
79  # Prediction scaling width, height and x, y
80  predScaleWidth = pred[i][0, 0]
81  predScaleHeight = pred[i][1, 1]
82  predX = pred[i][0, 2]
83  predY = pred[i][1, 2]
84 
85  # Bounding box scaling width, height and x, y and bbox width, height
86  bboxScaleWidth = bbox[i].bbox.xform.mat3[0, 0]
87  bboxScaleHeight = bbox[i].bbox.xform.mat3[1, 1]
88  bboxX = bbox[i].bbox.xform.mat3[0, 2]
89  bboxY = bbox[i].bbox.xform.mat3[1, 2]
90  bboxWidth = bbox[i].bbox.width
91  bboxHeight = bbox[i].bbox.height
92 
93  # Compute corrected x, y and width, height (w, h) by proper adding
94  # bounding box and prediction x, y and by proper multiplying
95  # bounding box w, h with its own scaling and prediction scaling
96  x = bboxX + predX
97  y = bboxY + predY
98  w = bboxWidth * bboxScaleWidth * predScaleWidth
99  h = bboxHeight * bboxScaleHeight * predScaleHeight
100 
101  # Start point and end point of the bounding box for OpenCV drawing
102  startPoint = tuple(np.array([x, y], dtype=int))
103  endPoint = tuple(np.array([x, y], dtype=int) + np.array([w, h], dtype=int))
104 
105  # The color of the bounding box to be drawn
106  bboxColor = tuple([ int(c) for c in colors[0, i] ])
107  cv2.rectangle(cvGrayBGR, startPoint, endPoint, bboxColor, 2)
108 
109  # Incrementing the number of valid bounding boxes in the current frame
110  numValidBoxes += 1
111 
112  print(' Valid: {:02d} boxes'.format(numValidBoxes))
113 
114  outVideo.write(cvGrayBGR)
115  except Exception as e:
116  print('Error while writing output video:\n', e, file=sys.stderr)
117  exit(1)
118 
119 
120 # ----------------------------
121 # Parse command line arguments
122 
123 parser = ArgumentParser()
124 parser.add_argument('backend', choices=['cpu','cuda','pva'],
125  help='Backend to be used for processing')
126 
127 parser.add_argument('input',
128  help='Input video')
129 
130 parser.add_argument('boxes',
131  help='Text file with bounding boxes description')
132 
133 args = parser.parse_args()
134 
135 if args.backend == 'cpu':
136  backend = vpi.Backend.CPU
137 elif args.backend == 'cuda':
138  backend = vpi.Backend.CUDA
139 else:
140  assert args.backend == 'pva'
141  backend = vpi.Backend.PVA
142 
143 # -----------------------------
144 # Open input and output videos
145 
146 inVideo = cv2.VideoCapture(args.input)
147 
148 fourcc = cv2.VideoWriter_fourcc(*'MPEG')
149 inSize = (int(inVideo.get(cv2.CAP_PROP_FRAME_WIDTH)), int(inVideo.get(cv2.CAP_PROP_FRAME_HEIGHT)))
150 fps = inVideo.get(cv2.CAP_PROP_FPS)
151 
152 outVideo = cv2.VideoWriter('klt_python'+str(sys.version_info[0])+'_'+args.backend+'.mp4',
153  fourcc, fps, inSize)
154 
155 if not outVideo.isOpened():
156  print("Error creating output video", file=sys.stderr)
157  exit(1)
158 
159 # -----------------------------
160 # Reading input bounding boxes
161 
162 # All boxes is a dictionary of all bounding boxes to be tracked in the input video,
163 # where each value is a list of new bounding boxes to track at the frame indicated by its key
164 allBoxes = {}
165 totalNumBoxes = 0
166 
167 # Array capacity 0 means no restricted maximum number of bounding boxes
168 arrayCapacity = 0
169 
170 if backend == vpi.Backend.PVA:
171  # PVA requires 128 array capacity or maximum number of bounding boxes
172  arrayCapacity = 128
173 
174 with open(args.boxes) as f:
175  # The input file (f) should have one bounding box per lines as:
176  # "startFrame bboxX bboxY bboxWidth bboxHeight"; e.g.: "61 547 337 14 11"
177  for line in f.readlines():
178  line = line.replace('\n', '').replace('\r', '')
179  startFrame, x, y, w, h = [ float(v) for v in line.split(' ') ]
180  bb = (x, y, w, h)
181  if startFrame not in allBoxes:
182  allBoxes[startFrame] = [bb]
183  else:
184  allBoxes[startFrame].append(bb)
185  totalNumBoxes += 1
186  if totalNumBoxes == arrayCapacity:
187  # Stop adding boxes if its total reached the array capacity
188  break
189 
190 curFrame = 0
191 curNumBoxes = len(allBoxes[curFrame])
192 
193 # ------------------------------------------------------------------------------
194 # Initialize VPI array with all input bounding boxes (same as C++ KLT sample)
195 
196 if arrayCapacity == 0:
197  arrayCapacity = totalNumBoxes
198 
199 inBoxes = vpi.Array(arrayCapacity, vpi.Type.KLT_TRACKED_BOUNDING_BOX)
200 
201 inBoxes.size = totalNumBoxes
202 with inBoxes.wlock_cpu():
203  data = inBoxes.cpu().view(np.recarray)
204 
205  # Global index i of all bounding boxes data, starting at 0
206  i = 0
207 
208  for f in sorted(allBoxes.keys()):
209  for bb in allBoxes[f]:
210  # Each bounding box bb is a tuple of (x, y, w, h)
211  x, y, w, h = bb
212 
213  # The bounding box data is the identity for the scaling part,
214  # meaning no scaling, and the offset part is its position x, y
215  data[i].bbox.xform.mat3[0, 0] = 1
216  data[i].bbox.xform.mat3[1, 1] = 1
217  data[i].bbox.xform.mat3[2, 2] = 1
218  data[i].bbox.xform.mat3[0, 2] = x
219  data[i].bbox.xform.mat3[1, 2] = y
220 
221  # The bounding box data stores its width and height w, h
222  data[i].bbox.width = w
223  data[i].bbox.height = h
224 
225  # Initially all boxes have status tracked and update needed
226  data[i].tracking_status = vpi.KLTTrackStatus.TRACKED
227  data[i].template_status = vpi.KLTTemplateStatus.UPDATE_NEEDED
228 
229  # Incrementing the global index for the next bounding box
230  i += 1
231 
232 #-------------------------------------------------------------------------------
233 # Generate random colors for bounding boxes equal to the C++ KLT sample
234 
235 hues = np.zeros((totalNumBoxes,), dtype=np.uint8)
236 
237 if int(cv2.__version__.split('.')[0]) >= 3:
238  cv2.setRNGSeed(1)
239  hues = cv2.randu(hues, 0, 180)
240 else:
241  # Random differs in OpenCV-2.4
242  rng = cv2.cv.RNG(1)
243  hues = cv2.cv.fromarray(np.array([[ h for h in hues ]], dtype=np.uint8))
244  cv2.cv.RandArr(rng, hues, cv2.cv.CV_RAND_UNI, 0, 180)
245  hues = [ hues[0, i] for i in range(totalNumBoxes) ]
246 
247 colors = np.array([[ [int(h), 255, 255] for h in hues ]], dtype=np.uint8)
248 colors = cv2.cvtColor(colors, cv2.COLOR_HSV2BGR)
249 
250 #-------------------------------------------------------------------------------
251 # Initialize the KLT Feature Tracker algorithm
252 
253 # Load up first frame
254 validFrame, cvFrame = inVideo.read()
255 if not validFrame:
256  print("Error reading first input frame", file=sys.stderr)
257  exit(1)
258 
259 # Convert OpenCV frame to gray returning also the VPI image for given backend
260 cvGray, imgTemplate = convertFrameImage(cvFrame, backend)
261 
262 # Create the KLT Feature Tracker object using the backend specified by the user
263 klt = vpi.KLTFeatureTracker(imgTemplate, inBoxes, backend=backend)
264 
265 #-------------------------------------------------------------------------------
266 # Main processing loop
267 
268 while validFrame:
269  print('Frame: {:04d} ; Total: {:02d} boxes ;'.format(curFrame, curNumBoxes), end='')
270 
271  # Adjust input boxes and predictions to the current number of boxes
272  inPreds = klt.in_predictions()
273 
274  inPreds.size = curNumBoxes
275  inBoxes.size = curNumBoxes
276 
277  # Write current frame to the output video
278  writeOutput(outVideo, cvGray, inBoxes, inPreds, colors, backend)
279 
280  # Read next input frame
281  curFrame += 1
282  validFrame, cvFrame = inVideo.read()
283  if not validFrame:
284  break
285 
286  cvGray, imgReference = convertFrameImage(cvFrame, backend)
287 
288  outBoxes = klt(imgReference)
289 
290  if curFrame in allBoxes:
291  curNumBoxes += len(allBoxes[curFrame])
292 
293 outVideo.release()
294 
295 # vim: ts=8:sw=4:sts=4:et:ai
29 #include <opencv2/core/version.hpp>
30 #include <opencv2/imgcodecs.hpp>
31 #include <opencv2/imgproc/imgproc.hpp>
32 #include <opencv2/videoio.hpp>
33 #include <vpi/OpenCVInterop.hpp>
34 
35 #include <vpi/Array.h>
36 #include <vpi/Image.h>
37 #include <vpi/Status.h>
38 #include <vpi/Stream.h>
40 
41 #include <cstring> // for memset
42 #include <fstream>
43 #include <iostream>
44 #include <map>
45 #include <sstream>
46 #include <vector>
47 
48 #define CHECK_STATUS(STMT) \
49  do \
50  { \
51  VPIStatus status = (STMT); \
52  if (status != VPI_SUCCESS) \
53  { \
54  char buffer[VPI_MAX_STATUS_MESSAGE_LENGTH]; \
55  vpiGetLastStatusMessage(buffer, sizeof(buffer)); \
56  std::ostringstream ss; \
57  ss << vpiStatusGetName(status) << ": " << buffer; \
58  throw std::runtime_error(ss.str()); \
59  } \
60  } while (0);
61 
62 // Utility to draw the bounding boxes into an image and save it to disk.
63 static cv::Mat WriteKLTBoxes(VPIImage img, VPIArray boxes, VPIArray preds)
64 {
65  // Convert img into a cv::Mat
66  cv::Mat out;
67  {
68  VPIImageData imgdata;
70 
72  VPIImageBufferPitchLinear &imgPitch = imgdata.buffer.pitch;
73 
74  int cvtype;
75  switch (imgPitch.format)
76  {
78  cvtype = CV_8U;
79  break;
80 
82  cvtype = CV_8S;
83  break;
84 
86  cvtype = CV_16UC1;
87  break;
88 
90  cvtype = CV_16SC1;
91  break;
92 
93  default:
94  throw std::runtime_error("Image type not supported");
95  }
96 
97  cv::Mat cvimg(imgPitch.planes[0].height, imgPitch.planes[0].width, cvtype, imgPitch.planes[0].data,
98  imgPitch.planes[0].pitchBytes);
99 
100  if (cvimg.type() == CV_16U)
101  {
102  cvimg.convertTo(out, CV_8U);
103  cvimg = out;
104  out = cv::Mat();
105  }
106 
107  cvtColor(cvimg, out, cv::COLOR_GRAY2BGR);
108 
109  CHECK_STATUS(vpiImageUnlock(img));
110  }
111 
112  // Now draw the bounding boxes.
113  VPIArrayData boxdata;
114  CHECK_STATUS(vpiArrayLockData(boxes, VPI_LOCK_READ, VPI_ARRAY_BUFFER_HOST_AOS, &boxdata));
115 
116  VPIArrayData preddata;
117  CHECK_STATUS(vpiArrayLockData(preds, VPI_LOCK_READ, VPI_ARRAY_BUFFER_HOST_AOS, &preddata));
118 
119  auto *pboxes = reinterpret_cast<VPIKLTTrackedBoundingBox *>(boxdata.buffer.aos.data);
120  auto *ppreds = reinterpret_cast<VPIHomographyTransform2D *>(preddata.buffer.aos.data);
121 
122  // Use random high-saturated colors
123  static std::vector<cv::Vec3b> colors;
124  if ((int)colors.size() != *boxdata.buffer.aos.sizePointer)
125  {
126  colors.resize(*boxdata.buffer.aos.sizePointer);
127 
128  cv::RNG rand(1);
129  for (size_t i = 0; i < colors.size(); ++i)
130  {
131  colors[i] = cv::Vec3b(rand.uniform(0, 180), 255, 255);
132  }
133  cvtColor(colors, colors, cv::COLOR_HSV2BGR);
134  }
135 
136  // For each tracked bounding box...
137  for (int i = 0; i < *boxdata.buffer.aos.sizePointer; ++i)
138  {
139  if (pboxes[i].trackingStatus == 1)
140  {
141  continue;
142  }
143 
144  float x, y, w, h;
145  x = pboxes[i].bbox.xform.mat3[0][2] + ppreds[i].mat3[0][2];
146  y = pboxes[i].bbox.xform.mat3[1][2] + ppreds[i].mat3[1][2];
147  w = pboxes[i].bbox.width * pboxes[i].bbox.xform.mat3[0][0] * ppreds[i].mat3[0][0];
148  h = pboxes[i].bbox.height * pboxes[i].bbox.xform.mat3[1][1] * ppreds[i].mat3[1][1];
149 
150  rectangle(out, cv::Rect(x, y, w, h), cv::Scalar(colors[i][0], colors[i][1], colors[i][2]), 2);
151  }
152 
153  CHECK_STATUS(vpiArrayUnlock(preds));
154  CHECK_STATUS(vpiArrayUnlock(boxes));
155 
156  return out;
157 }
158 
159 int main(int argc, char *argv[])
160 {
161  // OpenCV image that will be wrapped by a VPIImage.
162  // Define it here so that it's destroyed *after* wrapper is destroyed
163  cv::Mat cvTemplate, cvReference;
164 
165  // Arrays that will store our input bboxes and predicted transform.
166  VPIArray inputBoxList = NULL, inputPredList = NULL;
167 
168  // Other VPI objects that will be used
169  VPIStream stream = NULL;
170  VPIArray outputBoxList = NULL;
171  VPIArray outputEstimList = NULL;
172  VPIPayload klt = NULL;
173  VPIImage imgReference = NULL;
174  VPIImage imgTemplate = NULL;
175 
176  int retval = 0;
177  try
178  {
179  if (argc != 4)
180  {
181  throw std::runtime_error(std::string("Usage: ") + argv[0] + " <cpu|pva|cuda> <input_video> <bbox descr>");
182  }
183 
184  std::string strBackend = argv[1];
185  std::string strInputVideo = argv[2];
186  std::string strInputBBoxes = argv[3];
187 
188  // Load the input video
189  cv::VideoCapture invid;
190  if (!invid.open(strInputVideo))
191  {
192  throw std::runtime_error("Can't open '" + strInputVideo + "'");
193  }
194 
195  // Open the output video for writing using input's characteristics
196  int w = invid.get(cv::CAP_PROP_FRAME_WIDTH);
197  int h = invid.get(cv::CAP_PROP_FRAME_HEIGHT);
198  int fourcc = cv::VideoWriter::fourcc('M', 'P', 'E', 'G');
199  double fps = invid.get(cv::CAP_PROP_FPS);
200 
201  cv::VideoWriter outVideo("klt_" + strBackend + ".mp4", fourcc, fps, cv::Size(w, h));
202  if (!outVideo.isOpened())
203  {
204  throw std::runtime_error("Can't create output video");
205  }
206 
207  // Load the bounding boxes
208  // Format is: <frame number> <bbox_x> <bbox_y> <bbox_width> <bbox_height>
209  // Important assumption: bboxes must be sorted with increasing frame numbers.
210 
211  // These arrays will actually wrap these vectors.
212  std::vector<VPIKLTTrackedBoundingBox> bboxes;
213  int32_t bboxesSize = 0;
214  std::vector<VPIHomographyTransform2D> preds;
215  int32_t predsSize = 0;
216 
217  // Stores how many bboxes there are in each frame. Only
218  // stores when the bboxes count change.
219  std::map<int, size_t> bboxes_size_at_frame; // frame -> bbox count
220 
221  // PVA requires that array capacity is 128.
222  bboxes.reserve(128);
223  preds.reserve(128);
224 
225  // Read bounding boxes
226  {
227  std::ifstream in(strInputBBoxes);
228  if (!in)
229  {
230  throw std::runtime_error("Can't open '" + strInputBBoxes + "'");
231  }
232 
233  // For each bounding box,
234  int frame, x, y, w, h;
235  while (in >> frame >> x >> y >> w >> h)
236  {
237  if (bboxes.size() == 64)
238  {
239  throw std::runtime_error("Too many bounding boxes");
240  }
241 
242  // Convert the axis-aligned bounding box into our tracking
243  // structure.
244 
245  VPIKLTTrackedBoundingBox track = {};
246  // scale
247  track.bbox.xform.mat3[0][0] = 1;
248  track.bbox.xform.mat3[1][1] = 1;
249  // position
250  track.bbox.xform.mat3[0][2] = x;
251  track.bbox.xform.mat3[1][2] = y;
252  // must be 1
253  track.bbox.xform.mat3[2][2] = 1;
254 
255  track.bbox.width = w;
256  track.bbox.height = h;
257  track.trackingStatus = 0; // valid tracking
258  track.templateStatus = 1; // must update
259 
260  bboxes.push_back(track);
261 
262  // Identity predicted transform.
263  VPIHomographyTransform2D xform = {};
264  xform.mat3[0][0] = 1;
265  xform.mat3[1][1] = 1;
266  xform.mat3[2][2] = 1;
267  preds.push_back(xform);
268 
269  bboxes_size_at_frame[frame] = bboxes.size();
270  }
271 
272  if (!in && !in.eof())
273  {
274  throw std::runtime_error("Can't parse bounding boxes, stopped at bbox #" +
275  std::to_string(bboxes.size()));
276  }
277 
278  // Wrap the input arrays into VPIArray's
279  VPIArrayData data = {};
282  data.buffer.aos.capacity = bboxes.capacity();
283  data.buffer.aos.sizePointer = &bboxesSize;
284  data.buffer.aos.data = &bboxes[0];
285  CHECK_STATUS(vpiArrayCreateWrapper(&data, 0, &inputBoxList));
286 
288  data.buffer.aos.sizePointer = &predsSize;
289  data.buffer.aos.data = &preds[0];
290  CHECK_STATUS(vpiArrayCreateWrapper(&data, 0, &inputPredList));
291  }
292 
293  // Now parse the backend
294  VPIBackend backend;
295 
296  if (strBackend == "cpu")
297  {
298  backend = VPI_BACKEND_CPU;
299  }
300  else if (strBackend == "cuda")
301  {
302  backend = VPI_BACKEND_CUDA;
303  }
304  else if (strBackend == "pva")
305  {
306  backend = VPI_BACKEND_PVA;
307  }
308  else
309  {
310  throw std::runtime_error("Backend '" + strBackend +
311  "' not recognized, it must be either cpu, cuda or pva.");
312  }
313 
314  // Create the stream for the given backend.
315  CHECK_STATUS(vpiStreamCreate(backend, &stream));
316 
317  // Helper function to fetch a frame from input
318  int nextFrame = 0;
319  auto fetchFrame = [&invid, &nextFrame, backend]() {
320  cv::Mat frame;
321  if (!invid.read(frame))
322  {
323  return cv::Mat();
324  }
325 
326  // We only support grayscale inputs
327  if (frame.channels() == 3)
328  {
329  cvtColor(frame, frame, cv::COLOR_BGR2GRAY);
330  }
331 
332  if (backend == VPI_BACKEND_PVA)
333  {
334  // PVA only supports 16-bit unsigned inputs,
335  // where each element is in 0-255 range, so
336  // no rescaling needed.
337  cv::Mat aux;
338  frame.convertTo(aux, CV_16U);
339  frame = aux;
340  }
341  else
342  {
343  assert(frame.type() == CV_8U);
344  }
345 
346  ++nextFrame;
347  return frame;
348  };
349 
350  // Fetch the first frame and wrap it into a VPIImage.
351  // Templates will be based on this frame.
352  cvTemplate = fetchFrame();
353  CHECK_STATUS(vpiImageCreateWrapperOpenCVMat(cvTemplate, 0, &imgTemplate));
354 
355  // Create the reference image wrapper. Let's wrap the cvTemplate for now just
356  // to create the wrapper. Later we'll set it to wrap the actual reference image.
357  CHECK_STATUS(vpiImageCreateWrapperOpenCVMat(cvTemplate, 0, &imgReference));
358 
359  VPIImageFormat imgFormat;
360  CHECK_STATUS(vpiImageGetFormat(imgTemplate, &imgFormat));
361 
362  // Using this first frame's characteristics, create a KLT Bounding Box Tracker payload.
363  // We're limiting the template dimensions to 64x64.
364  CHECK_STATUS(vpiCreateKLTFeatureTracker(backend, cvTemplate.cols, cvTemplate.rows, imgFormat, NULL, &klt));
365 
366  // Parameters we'll use. No need to change them on the fly, so just define them here.
368  CHECK_STATUS(vpiInitKLTFeatureTrackerParams(&params));
369 
370  // Output array with estimated bbox for current frame.
371  CHECK_STATUS(vpiArrayCreate(128, VPI_ARRAY_TYPE_KLT_TRACKED_BOUNDING_BOX, 0, &outputBoxList));
372 
373  // Output array with estimated transform of input bbox to match output bbox.
374  CHECK_STATUS(vpiArrayCreate(128, VPI_ARRAY_TYPE_HOMOGRAPHY_TRANSFORM_2D, 0, &outputEstimList));
375 
376  size_t curNumBoxes = 0;
377 
378  do
379  {
380  size_t curFrame = nextFrame - 1;
381 
382  // Get the number of bounding boxes in current frame.
383  auto tmp = --bboxes_size_at_frame.upper_bound(curFrame);
384  size_t bbox_count = tmp->second;
385 
386  assert(bbox_count >= curNumBoxes && "input bounding boxes must be sorted by frame");
387 
388  // Does current frame have new bounding boxes?
389  if (curNumBoxes != bbox_count)
390  {
391  // Update the input array sizes, the new frame is already there as we populated
392  // these arrays with all input bounding boxes.
393  CHECK_STATUS(vpiArraySetSize(inputBoxList, bbox_count));
394  CHECK_STATUS(vpiArraySetSize(inputPredList, bbox_count));
395 
396  for (size_t i = 0; i < bbox_count - curNumBoxes; ++i)
397  {
398  std::cout << curFrame << " -> new " << curNumBoxes + i << std::endl;
399  }
400  assert(bbox_count <= bboxes.capacity());
401  assert(bbox_count <= preds.capacity());
402 
403  curNumBoxes = bbox_count;
404  }
405 
406  // Save this frame to disk.
407  outVideo << WriteKLTBoxes(imgTemplate, inputBoxList, inputPredList);
408 
409  // Fetch a new frame
410  cvReference = fetchFrame();
411 
412  // Video ended?
413  if (cvReference.data == NULL)
414  {
415  // Just end gracefully.
416  break;
417  }
418 
419  // Make the reference wrapper point to the reference frame
420  CHECK_STATUS(vpiImageSetWrappedOpenCVMat(imgReference, cvReference));
421 
422  // Estimate the bounding boxes in current frame (reference) given their position in previous
423  // frame (template).
424  CHECK_STATUS(vpiSubmitKLTFeatureTracker(stream, backend, klt, imgTemplate, inputBoxList, inputPredList,
425  imgReference, outputBoxList, outputEstimList, &params));
426 
427  // Wait for processing to finish.
428  CHECK_STATUS(vpiStreamSync(stream));
429 
430  // Now the input and output arrays are locked to properly set up the input for the next iteration.
431  // Input arrays will be updated based on tracking information produced in this iteration.
432  VPIArrayData updatedBBoxData;
433  CHECK_STATUS(vpiArrayLockData(outputBoxList, VPI_LOCK_READ, VPI_ARRAY_BUFFER_HOST_AOS, &updatedBBoxData));
434 
435  VPIArrayData estimData;
436  CHECK_STATUS(vpiArrayLockData(outputEstimList, VPI_LOCK_READ, VPI_ARRAY_BUFFER_HOST_AOS, &estimData));
437 
438  // Since these arrays are actually wrappers of external data, we don't need to retrieve
439  // the VPI array contents, the wrapped buffers will be updated directly. The arrays must
440  // be locked for read/write anyway.
441  CHECK_STATUS(vpiArrayLock(inputBoxList, VPI_LOCK_READ_WRITE));
442  CHECK_STATUS(vpiArrayLock(inputPredList, VPI_LOCK_READ_WRITE));
443 
444  auto *updated_bbox = reinterpret_cast<VPIKLTTrackedBoundingBox *>(updatedBBoxData.buffer.aos.data);
445  auto *estim = reinterpret_cast<VPIHomographyTransform2D *>(estimData.buffer.aos.data);
446 
447  // For each bounding box,
448  for (size_t b = 0; b < curNumBoxes; ++b)
449  {
450  // Did tracking failed?
451  if (updated_bbox[b].trackingStatus)
452  {
453  // Do we have to update the input bbox's tracking status too?
454  if (bboxes[b].trackingStatus == 0)
455  {
456  std::cout << curFrame << " -> dropped " << b << std::endl;
457  bboxes[b].trackingStatus = 1;
458  }
459 
460  continue;
461  }
462 
463  // Must update template for this bounding box??
464  if (updated_bbox[b].templateStatus)
465  {
466  std::cout << curFrame << " -> update " << b << std::endl;
467 
468  // There are usually two approaches here:
469  // 1. Redefine the bounding box using a feature detector such as
470  // \ref algo_harris_corners "Harris keypoint detector", or
471  // 2. Use updated_bbox[b], which is still valid, although tracking
472  // errors might accumulate over time.
473  //
474  // We'll go to the second option, less robust, but simple enough
475  // to implement.
476  bboxes[b] = updated_bbox[b];
477 
478  // Signal the input that the template for this bounding box must be updated.
479  bboxes[b].templateStatus = 1;
480 
481  // Predicted transform is now identity as we reset the tracking.
482  preds[b] = VPIHomographyTransform2D{};
483  preds[b].mat3[0][0] = 1;
484  preds[b].mat3[1][1] = 1;
485  preds[b].mat3[2][2] = 1;
486  }
487  else
488  {
489  // Inform that the template for this bounding box doesn't need to be pdated.
490  bboxes[b].templateStatus = 0;
491 
492  // We just update the input transform with the estimated one.
493  preds[b] = estim[b];
494  }
495  }
496 
497  // We're finished working with the input and output arrays.
498  CHECK_STATUS(vpiArrayUnlock(inputBoxList));
499  CHECK_STATUS(vpiArrayUnlock(inputPredList));
500 
501  CHECK_STATUS(vpiArrayUnlock(outputBoxList));
502  CHECK_STATUS(vpiArrayUnlock(outputEstimList));
503 
504  // Next's reference frame is current's template.
505  std::swap(imgTemplate, imgReference);
506  std::swap(cvTemplate, cvReference);
507  } while (true);
508  }
509  catch (std::exception &e)
510  {
511  std::cerr << e.what() << std::endl;
512  retval = 1;
513  }
514 
515  vpiStreamDestroy(stream);
516  vpiPayloadDestroy(klt);
517  vpiArrayDestroy(inputBoxList);
518  vpiArrayDestroy(inputPredList);
519  vpiArrayDestroy(outputBoxList);
520  vpiArrayDestroy(outputEstimList);
521  vpiImageDestroy(imgReference);
522  vpiImageDestroy(imgTemplate);
523 
524  return retval;
525 }
Functions and structures for dealing with VPI arrays.
#define VPI_IMAGE_FORMAT_U16
Single plane with one 16-bit unsigned integer channel.
Definition: ImageFormat.h:111
#define VPI_IMAGE_FORMAT_S16
Single plane with one 16-bit signed integer channel.
Definition: ImageFormat.h:120
#define VPI_IMAGE_FORMAT_S8
Single plane with one 8-bit signed integer channel.
Definition: ImageFormat.h:108
#define VPI_IMAGE_FORMAT_U8
Single plane with one 8-bit unsigned integer channel.
Definition: ImageFormat.h:100
Functions and structures for dealing with VPI images.
Declares functions that implement the KLT Feature Tracker algorithm.
Functions for handling OpenCV interoperability with VPI.
Declaration of VPI status codes handling functions.
Declares functions dealing with VPI streams.
VPIArrayBufferType bufferType
Type of array buffer.
Definition: Array.h:172
void * data
Points to the first element of the array.
Definition: Array.h:135
VPIArrayBuffer buffer
Stores the array contents.
Definition: Array.h:175
int32_t * sizePointer
Points to the number of elements in the array.
Definition: Array.h:122
VPIArrayBufferAOS aos
Array stored in array-of-structures layout.
Definition: Array.h:162
int32_t capacity
Maximum number of elements that the array can hold.
Definition: Array.h:126
VPIArrayType type
Type of each array element.
Definition: Array.h:118
VPIStatus vpiArraySetSize(VPIArray array, int32_t size)
Set the array size in elements.
VPIStatus vpiArrayUnlock(VPIArray array)
Releases the lock on array object.
VPIStatus vpiArrayLockData(VPIArray array, VPILockMode mode, VPIArrayBufferType bufType, VPIArrayData *data)
Acquires the lock on an array object and returns the array contents.
VPIStatus vpiArrayCreateWrapper(const VPIArrayData *data, uint64_t flags, VPIArray *array)
Create an array object by wrapping an existing host memory block.
void vpiArrayDestroy(VPIArray array)
Destroy an array instance.
VPIStatus vpiArrayCreate(int32_t capacity, VPIArrayType type, uint64_t flags, VPIArray *array)
Create an empty array instance.
VPIStatus vpiArrayLock(VPIArray array, VPILockMode mode)
Acquires the lock on an array object.
struct VPIArrayImpl * VPIArray
A handle to an array.
Definition: Types.h:232
@ VPI_ARRAY_TYPE_KLT_TRACKED_BOUNDING_BOX
VPIKLTTrackedBoundingBox element.
Definition: ArrayType.h:79
@ VPI_ARRAY_TYPE_HOMOGRAPHY_TRANSFORM_2D
VPIHomographyTransform2D element.
Definition: ArrayType.h:78
@ VPI_ARRAY_BUFFER_HOST_AOS
Host-accessible array-of-structures.
Definition: Array.h:146
Stores information about array characteristics and contents.
Definition: Array.h:168
uint64_t VPIImageFormat
Pre-defined image formats.
Definition: ImageFormat.h:94
VPIImageBuffer buffer
Stores the image contents.
Definition: Image.h:241
VPIImagePlanePitchLinear planes[VPI_MAX_PLANE_COUNT]
Data of all image planes in pitch-linear layout.
Definition: Image.h:160
VPIImageBufferPitchLinear pitch
Image stored in pitch-linear layout.
Definition: Image.h:210
void * data
Pointer to the first row of this plane.
Definition: Image.h:141
VPIImageFormat format
Image format.
Definition: Image.h:152
VPIImageBufferType bufferType
Type of image buffer.
Definition: Image.h:238
int32_t height
Height of this plane in pixels.
Definition: Image.h:123
int32_t width
Width of this plane in pixels.
Definition: Image.h:119
int32_t pitchBytes
Difference in bytes of beginning of one row and the beginning of the previous.
Definition: Image.h:134
void vpiImageDestroy(VPIImage img)
Destroy an image instance.
struct VPIImageImpl * VPIImage
A handle to an image.
Definition: Types.h:256
VPIStatus vpiImageLockData(VPIImage img, VPILockMode mode, VPIImageBufferType bufType, VPIImageData *data)
Acquires the lock on an image object and returns the image contents.
VPIStatus vpiImageGetFormat(VPIImage img, VPIImageFormat *format)
Get the image format.
VPIStatus vpiImageUnlock(VPIImage img)
Releases the lock on an image object.
@ VPI_IMAGE_BUFFER_HOST_PITCH_LINEAR
Host-accessible with planes in pitch-linear memory layout.
Definition: Image.h:172
Stores the image plane contents.
Definition: Image.h:150
Stores information about image characteristics and content.
Definition: Image.h:234
int8_t templateStatus
Status of the template related to this bounding box.
Definition: Types.h:504
int8_t trackingStatus
Tracking status of this bounding box.
Definition: Types.h:497
VPIBoundingBox bbox
Bounding box being tracked.
Definition: Types.h:490
VPIStatus vpiCreateKLTFeatureTracker(uint64_t backends, int32_t imageWidth, int32_t imageHeight, VPIImageFormat imageFormat, const VPIKLTFeatureTrackerCreationParams *params, VPIPayload *payload)
Creates payload for vpiSubmitKLTFeatureTracker.
VPIStatus vpiSubmitKLTFeatureTracker(VPIStream stream, uint64_t backend, VPIPayload payload, VPIImage templateImage, VPIArray inputBoxList, VPIArray inputPredictionList, VPIImage referenceImage, VPIArray outputBoxList, VPIArray outputEstimationList, const VPIKLTFeatureTrackerParams *params)
Runs KLT Feature Tracker on two frames.
VPIStatus vpiInitKLTFeatureTrackerParams(VPIKLTFeatureTrackerParams *params)
Initialize VPIKLTFeatureTrackerParams with default values.
Structure that defines the parameters for vpiCreateKLTFeatureTracker.
Stores a bounding box that is being tracked by KLT Tracker.
Definition: Types.h:488
VPIStatus vpiImageCreateWrapperOpenCVMat(const cv::Mat &mat, VPIImageFormat fmt, uint64_t flags, VPIImage *img)
Wraps a cv::Mat in an VPIImage with the given image format.
VPIStatus vpiImageSetWrappedOpenCVMat(VPIImage img, const cv::Mat &mat)
Redefines the wrapped cv::Mat of an existing VPIImage wrapper.
struct VPIPayloadImpl * VPIPayload
A handle to an algorithm payload.
Definition: Types.h:268
void vpiPayloadDestroy(VPIPayload payload)
Deallocates the payload object and all associated resources.
struct VPIStreamImpl * VPIStream
A handle to a stream.
Definition: Types.h:250
VPIStatus vpiStreamSync(VPIStream stream)
Blocks the calling thread until all submitted commands in this stream queue are done (queue is empty)...
VPIBackend
VPI Backend types.
Definition: Types.h:91
void vpiStreamDestroy(VPIStream stream)
Destroy a stream instance and deallocate all HW resources.
VPIStatus vpiStreamCreate(uint64_t flags, VPIStream *stream)
Create a stream instance.
@ VPI_BACKEND_CUDA
CUDA backend.
Definition: Types.h:93
@ VPI_BACKEND_PVA
PVA backend.
Definition: Types.h:94
@ VPI_BACKEND_CPU
CPU backend.
Definition: Types.h:92
float width
Bounding box width.
Definition: Types.h:426
float height
Bounding box height.
Definition: Types.h:427
VPIHomographyTransform2D xform
Defines the bounding box top left corner and its homography.
Definition: Types.h:425
float mat3[3][3]
3x3 homogeneous matrix that defines the homography.
Definition: Types.h:405
@ VPI_LOCK_READ_WRITE
Lock memory for reading and writing.
Definition: Types.h:631
@ VPI_LOCK_READ
Lock memory only for reading.
Definition: Types.h:617
Stores a generic 2D homography transform.
Definition: Types.h:404