VPI - Vision Programming Interface

1.2 Release

KLT Bounding Box Tracker

Overview

This application tracks bounding boxes on an input video, draws them on each frame and saves the result in video file. The user can define what backend will be used for processing.

Note
The output will be in grayscale as the algorithm currently doesn't support color inputs.

Instructions

The usage is:

./vpi_sample_06_klt_tracker <backend> <input video> <input bboxes>

where

  • backend: either cpu, cuda or pva; it defines the backend that will perform the processing.
  • input video: input video file name, it accepts all video types that OpenCV's cv::VideoCapture accepts.
  • input bboxes: file with input bounding boxes and in what frame they appear. The file is composed of multiple lines with the following format:
       <frame> <bbox_x> <bbox_y> <bbox_width> <bbox_height>
    It's important that the lines are sorted with frames in ascending order.

Here's one example:

./vpi_sample_06_klt_tracker cuda ../assets/dashcam.mp4 ../assets/dashcam_bboxes.txt

This is using the CUDA backend and one of the provided sample videos and bounding boxes. It'll render the tracked bounding boxes into klt_cuda.mp4.

Note
If using OpenCV-2.4 or older (i.e. on Ubuntu 16.04), output file is klt_cuda.avi.

Results

Tracking Result
Note
Video output requires HTML5-capable browser that supports H.264 mp4 video decoding.

Source Code

For convenience, here's the code that is also installed in the samples directory.

Language:
28 from __future__ import print_function
29 
30 import sys
31 from argparse import ArgumentParser
32 import numpy as np
33 import cv2
34 import vpi
35 
36 
37 # Convert a colored input frame to grayscale (if needed)
38 # and then, if using PVA backend, convert it to 16-bit unsigned pixels;
39 # The converted frame is copied before wrapping it as a VPI image so
40 # later draws in the gray frame do not change the reference VPI image.
41 def convertFrameImage(inputFrame, backend):
42  if inputFrame.ndim == 3 and inputFrame.shape[2] == 3:
43  grayFrame = cv2.cvtColor(inputFrame, cv2.COLOR_BGR2GRAY)
44  else:
45  grayFrame = inputFrame
46  if backend == vpi.Backend.PVA:
47  # PVA only supports 16-bit unsigned inputs,
48  # where each element is in 0-255 range, so
49  # no rescaling is needed.
50  grayFrame = grayFrame.astype(np.uint16)
51  grayImage = vpi.asimage(grayFrame.copy())
52  return grayFrame, grayImage
53 
54 
55 # Write the input gray frame to output video with
56 # input bounding boxes and predictions
57 def writeOutput(outVideo, cvGray, inBoxes, inPreds, colors, backend):
58  try:
59  if cvGray.dtype == np.uint16:
60  cvGray = cvGray.astype(np.uint8)
61  if cvGray.dtype != np.uint8:
62  raise Exception('Input frame format must be grayscale, 8-bit unsigned')
63  cvGrayBGR = cv2.cvtColor(cvGray, cv2.COLOR_GRAY2BGR)
64 
65  # Tracking the number of valid bounding boxes in the current frame
66  numValidBoxes = 0
67 
68  # Draw the input bounding boxes considering the input predictions
69  with inBoxes.lock(vpi.Lock.READ), inPreds.lock(vpi.Lock.READ):
70  # Array of bounding boxes (bbox) and predictions (pred)
71  bbox = inBoxes.cpu().view(np.recarray)
72  pred = inPreds.cpu()
73 
74  for i in range(inBoxes.size):
75  if bbox[i].tracking_status == vpi.KLTTrackStatus.LOST:
76  # If the tracking status of the current bounding box is lost, skip it
77  continue
78 
79  # Gather information of the current (i) bounding box and prediction
80  # Prediction scaling width, height and x, y
81  predScaleWidth = pred[i][0, 0]
82  predScaleHeight = pred[i][1, 1]
83  predX = pred[i][0, 2]
84  predY = pred[i][1, 2]
85 
86  # Bounding box scaling width, height and x, y and bbox width, height
87  bboxScaleWidth = bbox[i].bbox.xform.mat3[0, 0]
88  bboxScaleHeight = bbox[i].bbox.xform.mat3[1, 1]
89  bboxX = bbox[i].bbox.xform.mat3[0, 2]
90  bboxY = bbox[i].bbox.xform.mat3[1, 2]
91  bboxWidth = bbox[i].bbox.width
92  bboxHeight = bbox[i].bbox.height
93 
94  # Compute corrected x, y and width, height (w, h) by proper adding
95  # bounding box and prediction x, y and by proper multiplying
96  # bounding box w, h with its own scaling and prediction scaling
97  x = bboxX + predX
98  y = bboxY + predY
99  w = bboxWidth * bboxScaleWidth * predScaleWidth
100  h = bboxHeight * bboxScaleHeight * predScaleHeight
101 
102  # Start point and end point of the bounding box for OpenCV drawing
103  startPoint = tuple(np.array([x, y], dtype=int))
104  endPoint = tuple(np.array([x, y], dtype=int) + np.array([w, h], dtype=int))
105 
106  # The color of the bounding box to be drawn
107  bboxColor = tuple([ int(c) for c in colors[0, i] ])
108  cv2.rectangle(cvGrayBGR, startPoint, endPoint, bboxColor, 2)
109 
110  # Incrementing the number of valid bounding boxes in the current frame
111  numValidBoxes += 1
112 
113  print(' Valid: {:02d} boxes'.format(numValidBoxes))
114 
115  outVideo.write(cvGrayBGR)
116  except Exception as e:
117  print('Error while writing output video:\n', e, file=sys.stderr)
118  exit(1)
119 
120 
121 # ----------------------------
122 # Parse command line arguments
123 
124 parser = ArgumentParser()
125 parser.add_argument('backend', choices=['cpu','cuda','pva'],
126  help='Backend to be used for processing')
127 
128 parser.add_argument('input',
129  help='Input video to be denoised')
130 
131 parser.add_argument('boxes',
132  help='Text file with bounding boxes description')
133 
134 args = parser.parse_args()
135 
136 if args.backend == 'cpu':
137  backend = vpi.Backend.CPU
138 elif args.backend == 'cuda':
139  backend = vpi.Backend.CUDA
140 else:
141  assert args.backend == 'pva'
142  backend = vpi.Backend.PVA
143 
144 # -----------------------------
145 # Open input and output videos
146 
147 inVideo = cv2.VideoCapture(args.input)
148 
149 if int(cv2.__version__.split('.')[0]) >= 3:
150  extOutputVideo = '.mp4'
151  fourcc = cv2.VideoWriter_fourcc(*'avc1')
152  inSize = (int(inVideo.get(cv2.CAP_PROP_FRAME_WIDTH)), int(inVideo.get(cv2.CAP_PROP_FRAME_HEIGHT)))
153  fps = inVideo.get(cv2.CAP_PROP_FPS)
154 else:
155  # MP4 support with OpenCV-2.4 has issues, we'll use
156  # avi/mpeg instead.
157  extOutputVideo = '.avi'
158  fourcc = cv2.cv.CV_FOURCC('M','P','E','G')
159  inSize = (int(inVideo.get(cv2.cv.CV_CAP_PROP_FRAME_WIDTH)), int(inVideo.get(cv2.cv.CV_CAP_PROP_FRAME_HEIGHT)))
160  fps = inVideo.get(cv2.cv.CV_CAP_PROP_FPS)
161 
162 outVideo = cv2.VideoWriter('klt_python'+str(sys.version_info[0])+'_'+args.backend+extOutputVideo,
163  fourcc, fps, inSize)
164 
165 if not outVideo.isOpened():
166  print("Error creating output video", file=sys.stderr)
167  exit(1)
168 
169 # -----------------------------
170 # Reading input bounding boxes
171 
172 # All boxes is a dictionary of all bounding boxes to be tracked in the input video,
173 # where each value is a list of new bounding boxes to track at the frame indicated by its key
174 allBoxes = {}
175 totalNumBoxes = 0
176 
177 # Array capacity 0 means no restricted maximum number of bounding boxes
178 arrayCapacity = 0
179 
180 if backend == vpi.Backend.PVA:
181  # PVA requires 128 array capacity or maximum number of bounding boxes
182  arrayCapacity = 128
183 
184 with open(args.boxes) as f:
185  # The input file (f) should have one bounding box per lines as:
186  # "startFrame bboxX bboxY bboxWidth bboxHeight"; e.g.:
187  # "61 547 337 14 11"
188  for line in f.readlines():
189  line = line.replace('\n', '').replace('\r', '')
190  startFrame, x, y, w, h = [ float(v) for v in line.split(' ') ]
191  bb = (x, y, w, h)
192  if startFrame not in allBoxes:
193  allBoxes[startFrame] = [bb]
194  else:
195  allBoxes[startFrame].append(bb)
196  totalNumBoxes += 1
197  if totalNumBoxes == arrayCapacity:
198  # Stop adding boxes if its total number reached the array capacity
199  break
200 
201 curFrame = 0
202 curNumBoxes = len(allBoxes[curFrame])
203 
204 # ------------------------------------------------------------------------------
205 # Initialize VPI array with all input bounding boxes (same as C++ KLT sample)
206 
207 if arrayCapacity == 0:
208  arrayCapacity = totalNumBoxes
209 
210 inBoxes = vpi.Array(arrayCapacity, vpi.Type.KLT_TRACKED_BOUNDING_BOX)
211 
212 inBoxes.size = totalNumBoxes
213 with inBoxes.lock(vpi.Lock.WRITE):
214  data = inBoxes.cpu().view(np.recarray)
215 
216  # Global index i of all bounding boxes data, initializing it starts at 0
217  i = 0
218 
219  for f in sorted(allBoxes.keys()):
220  for bb in allBoxes[f]:
221  # Each bounding box bb is a tuple of (x, y, w, h) that can be unpacked
222  x, y, w, h = bb
223 
224  # The bounding box data is the identity for the scaling part,
225  # meaning no scaling, and the offset part is its position x, y
226  data[i].bbox.xform.mat3[0, 0] = 1
227  data[i].bbox.xform.mat3[1, 1] = 1
228  data[i].bbox.xform.mat3[2, 2] = 1
229  data[i].bbox.xform.mat3[0, 2] = x
230  data[i].bbox.xform.mat3[1, 2] = y
231 
232  # The bounding box data stores its width and height w, h
233  data[i].bbox.width = w
234  data[i].bbox.height = h
235 
236  # Initially all boxes have status tracked and update needed
237  data[i].tracking_status = vpi.KLTTrackStatus.TRACKED
238  data[i].template_status = vpi.KLTUpdateStatus.NEEDED
239 
240  # Incrementing the global index for the next bounding box
241  i += 1
242 
243 #-------------------------------------------------------------------------------
244 # Generate random colors for bounding boxes equal to the C++ KLT sample
245 
246 hues = np.zeros((totalNumBoxes,), dtype=np.uint8)
247 
248 if int(cv2.__version__.split('.')[0]) >= 3:
249  cv2.setRNGSeed(1)
250  hues = cv2.randu(hues, 0, 180)
251 else:
252  # Random differs in OpenCV-2.4
253  rng = cv2.cv.RNG(1)
254  hues = cv2.cv.fromarray(np.array([[ h for h in hues ]], dtype=np.uint8))
255  cv2.cv.RandArr(rng, hues, cv2.cv.CV_RAND_UNI, 0, 180)
256  hues = [ hues[0, i] for i in range(totalNumBoxes) ]
257 
258 colors = np.array([[ [int(h), 255, 255] for h in hues ]], dtype=np.uint8)
259 colors = cv2.cvtColor(colors, cv2.COLOR_HSV2BGR)
260 
261 #-------------------------------------------------------------------------------
262 # Initialize the KLT Feature Tracker algorithm
263 
264 # Load up first frame
265 validFrame, cvFrame = inVideo.read()
266 if not validFrame:
267  print("Error reading first input frame", file=sys.stderr)
268  exit(1)
269 
270 # Convert OpenCV frame to gray returning also the VPI image for given backend
271 cvGray, imgTemplate = convertFrameImage(cvFrame, backend)
272 
273 # Create the KLT Feature Tracker object using the backend specified by the user
274 klt = vpi.KLTFeatureTracker(imgTemplate, inBoxes, backend=backend)
275 
276 #-------------------------------------------------------------------------------
277 # Main processing loop
278 
279 while validFrame:
280  print('Frame: {:04d} ; Total: {:02d} boxes ;'.format(curFrame, curNumBoxes), end='')
281 
282  # Adjust input boxes and predictions to the current number of boxes
283  inPreds = klt.in_predictions()
284 
285  inPreds.size = curNumBoxes
286  inBoxes.size = curNumBoxes
287 
288  # Write current frame to the output video
289  writeOutput(outVideo, cvGray, inBoxes, inPreds, colors, backend)
290 
291  # Read next input frame
292  curFrame += 1
293  validFrame, cvFrame = inVideo.read()
294  if not validFrame:
295  break
296 
297  cvGray, imgReference = convertFrameImage(cvFrame, backend)
298 
299  outBoxes = klt(imgReference)
300 
301  if curFrame in allBoxes:
302  curNumBoxes += len(allBoxes[curFrame])
303 
304 outVideo.release()
305 
306 # vim: ts=8:sw=4:sts=4:et:ai
29 #include <opencv2/core/version.hpp>
30 #if CV_MAJOR_VERSION >= 3
31 # include <opencv2/imgcodecs.hpp>
32 # include <opencv2/videoio.hpp>
33 #else
34 # include <opencv2/highgui/highgui.hpp>
35 #endif
36 
37 #include <opencv2/imgproc/imgproc.hpp>
38 #include <vpi/OpenCVInterop.hpp>
39 
40 #include <vpi/Array.h>
41 #include <vpi/Image.h>
42 #include <vpi/Status.h>
43 #include <vpi/Stream.h>
45 
46 #include <cstring> // for memset
47 #include <fstream>
48 #include <iostream>
49 #include <map>
50 #include <sstream>
51 #include <vector>
52 
53 #define CHECK_STATUS(STMT) \
54  do \
55  { \
56  VPIStatus status = (STMT); \
57  if (status != VPI_SUCCESS) \
58  { \
59  char buffer[VPI_MAX_STATUS_MESSAGE_LENGTH]; \
60  vpiGetLastStatusMessage(buffer, sizeof(buffer)); \
61  std::ostringstream ss; \
62  ss << vpiStatusGetName(status) << ": " << buffer; \
63  throw std::runtime_error(ss.str()); \
64  } \
65  } while (0);
66 
67 // Utility to draw the bounding boxes into an image and save it to disk.
68 static cv::Mat WriteKLTBoxes(VPIImage img, VPIArray boxes, VPIArray preds)
69 {
70  // Convert img into a cv::Mat
71  cv::Mat out;
72  {
73  VPIImageData imgdata;
74  CHECK_STATUS(vpiImageLock(img, VPI_LOCK_READ, &imgdata));
75 
76  int cvtype;
77  switch (imgdata.format)
78  {
80  cvtype = CV_8U;
81  break;
82 
84  cvtype = CV_8S;
85  break;
86 
88  cvtype = CV_16UC1;
89  break;
90 
92  cvtype = CV_16SC1;
93  break;
94 
95  default:
96  throw std::runtime_error("Image type not supported");
97  }
98 
99  cv::Mat cvimg(imgdata.planes[0].height, imgdata.planes[0].width, cvtype, imgdata.planes[0].data,
100  imgdata.planes[0].pitchBytes);
101 
102  if (cvimg.type() == CV_16U)
103  {
104  cvimg.convertTo(out, CV_8U);
105  cvimg = out;
106  out = cv::Mat();
107  }
108 
109  cvtColor(cvimg, out, cv::COLOR_GRAY2BGR);
110 
111  CHECK_STATUS(vpiImageUnlock(img));
112  }
113 
114  // Now draw the bounding boxes.
115  VPIArrayData boxdata;
116  CHECK_STATUS(vpiArrayLock(boxes, VPI_LOCK_READ, &boxdata));
117 
118  VPIArrayData preddata;
119  CHECK_STATUS(vpiArrayLock(preds, VPI_LOCK_READ, &preddata));
120 
121  auto *pboxes = reinterpret_cast<VPIKLTTrackedBoundingBox *>(boxdata.data);
122  auto *ppreds = reinterpret_cast<VPIHomographyTransform2D *>(preddata.data);
123 
124  // Use random high-saturated colors
125  static std::vector<cv::Vec3b> colors;
126  if ((int)colors.size() != *boxdata.sizePointer)
127  {
128  colors.resize(*boxdata.sizePointer);
129 
130  cv::RNG rand(1);
131  for (size_t i = 0; i < colors.size(); ++i)
132  {
133  colors[i] = cv::Vec3b(rand.uniform(0, 180), 255, 255);
134  }
135  cvtColor(colors, colors, cv::COLOR_HSV2BGR);
136  }
137 
138  // For each tracked bounding box...
139  for (int i = 0; i < *boxdata.sizePointer; ++i)
140  {
141  if (pboxes[i].trackingStatus == 1)
142  {
143  continue;
144  }
145 
146  float x, y, w, h;
147  x = pboxes[i].bbox.xform.mat3[0][2] + ppreds[i].mat3[0][2];
148  y = pboxes[i].bbox.xform.mat3[1][2] + ppreds[i].mat3[1][2];
149  w = pboxes[i].bbox.width * pboxes[i].bbox.xform.mat3[0][0] * ppreds[i].mat3[0][0];
150  h = pboxes[i].bbox.height * pboxes[i].bbox.xform.mat3[1][1] * ppreds[i].mat3[1][1];
151 
152  rectangle(out, cv::Rect(x, y, w, h), cv::Scalar(colors[i][0], colors[i][1], colors[i][2]), 2);
153  }
154 
155  CHECK_STATUS(vpiArrayUnlock(preds));
156  CHECK_STATUS(vpiArrayUnlock(boxes));
157 
158  return out;
159 }
160 
161 int main(int argc, char *argv[])
162 {
163  // OpenCV image that will be wrapped by a VPIImage.
164  // Define it here so that it's destroyed *after* wrapper is destroyed
165  cv::Mat cvTemplate, cvReference;
166 
167  // Arrays that will store our input bboxes and predicted transform.
168  VPIArray inputBoxList = NULL, inputPredList = NULL;
169 
170  // Other VPI objects that will be used
171  VPIStream stream = NULL;
172  VPIArray outputBoxList = NULL;
173  VPIArray outputEstimList = NULL;
174  VPIPayload klt = NULL;
175  VPIImage imgReference = NULL;
176  VPIImage imgTemplate = NULL;
177 
178  int retval = 0;
179  try
180  {
181  if (argc != 4)
182  {
183  throw std::runtime_error(std::string("Usage: ") + argv[0] + " <cpu|pva|cuda> <input_video> <bbox descr>");
184  }
185 
186  std::string strBackend = argv[1];
187  std::string strInputVideo = argv[2];
188  std::string strInputBBoxes = argv[3];
189 
190  // Load the input video
191  cv::VideoCapture invid;
192  if (!invid.open(strInputVideo))
193  {
194  throw std::runtime_error("Can't open '" + strInputVideo + "'");
195  }
196 
197  // Open the output video for writing using input's characteristics
198 #if CV_MAJOR_VERSION >= 3
199  int w = invid.get(cv::CAP_PROP_FRAME_WIDTH);
200  int h = invid.get(cv::CAP_PROP_FRAME_HEIGHT);
201  int fourcc = cv::VideoWriter::fourcc('a', 'v', 'c', '1');
202  double fps = invid.get(cv::CAP_PROP_FPS);
203  std::string extOutputVideo = ".mp4";
204 #else
205  // MP4 support with OpenCV-2.4 has issues, we'll use
206  // avi/mpeg instead.
207  int w = invid.get(CV_CAP_PROP_FRAME_WIDTH);
208  int h = invid.get(CV_CAP_PROP_FRAME_HEIGHT);
209  int fourcc = CV_FOURCC('M', 'P', 'E', 'G');
210  double fps = invid.get(CV_CAP_PROP_FPS);
211  std::string extOutputVideo = ".avi";
212 #endif
213 
214  cv::VideoWriter outVideo("klt_" + strBackend + extOutputVideo, fourcc, fps, cv::Size(w, h));
215  if (!outVideo.isOpened())
216  {
217  throw std::runtime_error("Can't create output video");
218  }
219 
220  // Load the bounding boxes
221  // Format is: <frame number> <bbox_x> <bbox_y> <bbox_width> <bbox_height>
222  // Important assumption: bboxes must be sorted with increasing frame numbers.
223 
224  // These arrays will actually wrap these vectors.
225  std::vector<VPIKLTTrackedBoundingBox> bboxes;
226  int32_t bboxesSize = 0;
227  std::vector<VPIHomographyTransform2D> preds;
228  int32_t predsSize = 0;
229 
230  // Stores how many bboxes there are in each frame. Only
231  // stores when the bboxes count change.
232  std::map<int, size_t> bboxes_size_at_frame; // frame -> bbox count
233 
234  // PVA requires that array capacity is 128.
235  bboxes.reserve(128);
236  preds.reserve(128);
237 
238  // Read bounding boxes
239  {
240  std::ifstream in(strInputBBoxes);
241  if (!in)
242  {
243  throw std::runtime_error("Can't open '" + strInputBBoxes + "'");
244  }
245 
246  // For each bounding box,
247  int frame, x, y, w, h;
248  while (in >> frame >> x >> y >> w >> h)
249  {
250  if (bboxes.size() == 64)
251  {
252  throw std::runtime_error("Too many bounding boxes");
253  }
254 
255  // Convert the axis-aligned bounding box into our tracking
256  // structure.
257 
258  VPIKLTTrackedBoundingBox track = {};
259  // scale
260  track.bbox.xform.mat3[0][0] = 1;
261  track.bbox.xform.mat3[1][1] = 1;
262  // position
263  track.bbox.xform.mat3[0][2] = x;
264  track.bbox.xform.mat3[1][2] = y;
265  // must be 1
266  track.bbox.xform.mat3[2][2] = 1;
267 
268  track.bbox.width = w;
269  track.bbox.height = h;
270  track.trackingStatus = 0; // valid tracking
271  track.templateStatus = 1; // must update
272 
273  bboxes.push_back(track);
274 
275  // Identity predicted transform.
276  VPIHomographyTransform2D xform = {};
277  xform.mat3[0][0] = 1;
278  xform.mat3[1][1] = 1;
279  xform.mat3[2][2] = 1;
280  preds.push_back(xform);
281 
282  bboxes_size_at_frame[frame] = bboxes.size();
283  }
284 
285  if (!in && !in.eof())
286  {
287  throw std::runtime_error("Can't parse bounding boxes, stopped at bbox #" +
288  std::to_string(bboxes.size()));
289  }
290 
291  // Wrap the input arrays into VPIArray's
292  VPIArrayData data = {};
294  data.capacity = bboxes.capacity();
295  data.sizePointer = &bboxesSize;
296  data.data = &bboxes[0];
297  CHECK_STATUS(vpiArrayCreateHostMemWrapper(&data, 0, &inputBoxList));
298 
300  data.sizePointer = &predsSize;
301  data.data = &preds[0];
302  CHECK_STATUS(vpiArrayCreateHostMemWrapper(&data, 0, &inputPredList));
303  }
304 
305  // Now parse the backend
306  VPIBackend backend;
307 
308  if (strBackend == "cpu")
309  {
310  backend = VPI_BACKEND_CPU;
311  }
312  else if (strBackend == "cuda")
313  {
314  backend = VPI_BACKEND_CUDA;
315  }
316  else if (strBackend == "pva")
317  {
318  backend = VPI_BACKEND_PVA;
319  }
320  else
321  {
322  throw std::runtime_error("Backend '" + strBackend +
323  "' not recognized, it must be either cpu, cuda or pva.");
324  }
325 
326  // Create the stream for the given backend.
327  CHECK_STATUS(vpiStreamCreate(backend, &stream));
328 
329  // Helper function to fetch a frame from input
330  int nextFrame = 0;
331  auto fetchFrame = [&invid, &nextFrame, backend]() {
332  cv::Mat frame;
333  if (!invid.read(frame))
334  {
335  return cv::Mat();
336  }
337 
338  // We only support grayscale inputs
339  if (frame.channels() == 3)
340  {
341  cvtColor(frame, frame, cv::COLOR_BGR2GRAY);
342  }
343 
344  if (backend == VPI_BACKEND_PVA)
345  {
346  // PVA only supports 16-bit unsigned inputs,
347  // where each element is in 0-255 range, so
348  // no rescaling needed.
349  cv::Mat aux;
350  frame.convertTo(aux, CV_16U);
351  frame = aux;
352  }
353  else
354  {
355  assert(frame.type() == CV_8U);
356  }
357 
358  ++nextFrame;
359  return frame;
360  };
361 
362  // Fetch the first frame and wrap it into a VPIImage.
363  // Templates will be based on this frame.
364  cvTemplate = fetchFrame();
365  CHECK_STATUS(vpiImageCreateOpenCVMatWrapper(cvTemplate, 0, &imgTemplate));
366 
367  // Create the reference image wrapper. Let's wrap the cvTemplate for now just
368  // to create the wrapper. Later we'll set it to wrap the actual reference image.
369  CHECK_STATUS(vpiImageCreateOpenCVMatWrapper(cvTemplate, 0, &imgReference));
370 
371  VPIImageFormat imgFormat;
372  CHECK_STATUS(vpiImageGetFormat(imgTemplate, &imgFormat));
373 
374  // Using this first frame's characteristics, create a KLT Bounding Box Tracker payload.
375  // We're limiting the template dimensions to 64x64.
376  CHECK_STATUS(vpiCreateKLTFeatureTracker(backend, cvTemplate.cols, cvTemplate.rows, imgFormat, NULL, &klt));
377 
378  // Parameters we'll use. No need to change them on the fly, so just define them here.
380  CHECK_STATUS(vpiInitKLTFeatureTrackerParams(&params));
381 
382  // Output array with estimated bbox for current frame.
383  CHECK_STATUS(vpiArrayCreate(128, VPI_ARRAY_TYPE_KLT_TRACKED_BOUNDING_BOX, 0, &outputBoxList));
384 
385  // Output array with estimated transform of input bbox to match output bbox.
386  CHECK_STATUS(vpiArrayCreate(128, VPI_ARRAY_TYPE_HOMOGRAPHY_TRANSFORM_2D, 0, &outputEstimList));
387 
388  size_t curNumBoxes = 0;
389 
390  do
391  {
392  size_t curFrame = nextFrame - 1;
393 
394  // Get the number of bounding boxes in current frame.
395  auto tmp = --bboxes_size_at_frame.upper_bound(curFrame);
396  size_t bbox_count = tmp->second;
397 
398  assert(bbox_count >= curNumBoxes && "input bounding boxes must be sorted by frame");
399 
400  // Does current frame have new bounding boxes?
401  if (curNumBoxes != bbox_count)
402  {
403  // Update the input array sizes, the new frame is already there as we populated
404  // these arrays with all input bounding boxes.
405  CHECK_STATUS(vpiArrayLock(inputBoxList, VPI_LOCK_READ_WRITE, NULL));
406  CHECK_STATUS(vpiArraySetSize(inputBoxList, bbox_count));
407  CHECK_STATUS(vpiArrayUnlock(inputBoxList));
408 
409  CHECK_STATUS(vpiArrayLock(inputPredList, VPI_LOCK_READ_WRITE, NULL));
410  CHECK_STATUS(vpiArraySetSize(inputPredList, bbox_count));
411  CHECK_STATUS(vpiArrayUnlock(inputPredList));
412 
413  for (size_t i = 0; i < bbox_count - curNumBoxes; ++i)
414  {
415  std::cout << curFrame << " -> new " << curNumBoxes + i << std::endl;
416  }
417  assert(bbox_count <= bboxes.capacity());
418  assert(bbox_count <= preds.capacity());
419 
420  curNumBoxes = bbox_count;
421  }
422 
423  // Save this frame to disk.
424  outVideo << WriteKLTBoxes(imgTemplate, inputBoxList, inputPredList);
425 
426  // Fetch a new frame
427  cvReference = fetchFrame();
428 
429  // Video ended?
430  if (cvReference.data == NULL)
431  {
432  // Just end gracefully.
433  break;
434  }
435 
436  // Make the reference wrapper point to the reference frame
437  CHECK_STATUS(vpiImageSetWrappedOpenCVMat(imgReference, cvReference));
438 
439  // Estimate the bounding boxes in current frame (reference) given their position in previous
440  // frame (template).
441  CHECK_STATUS(vpiSubmitKLTFeatureTracker(stream, backend, klt, imgTemplate, inputBoxList, inputPredList,
442  imgReference, outputBoxList, outputEstimList, &params));
443 
444  // Wait for processing to finish.
445  CHECK_STATUS(vpiStreamSync(stream));
446 
447  // Now we lock the output arrays to properly set up the input for the next iteration.
448  VPIArrayData updatedBBoxData;
449  CHECK_STATUS(vpiArrayLock(outputBoxList, VPI_LOCK_READ, &updatedBBoxData));
450 
451  VPIArrayData estimData;
452  CHECK_STATUS(vpiArrayLock(outputEstimList, VPI_LOCK_READ, &estimData));
453 
454  auto *updated_bbox = reinterpret_cast<VPIKLTTrackedBoundingBox *>(updatedBBoxData.data);
455  auto *estim = reinterpret_cast<VPIHomographyTransform2D *>(estimData.data);
456 
457  // For each bounding box,
458  for (size_t b = 0; b < curNumBoxes; ++b)
459  {
460  // Did tracking failed?
461  if (updated_bbox[b].trackingStatus)
462  {
463  // Do we have to update the input bbox's tracking status too?
464  if (bboxes[b].trackingStatus == 0)
465  {
466  std::cout << curFrame << " -> dropped " << b << std::endl;
467  bboxes[b].trackingStatus = 1;
468  }
469 
470  continue;
471  }
472 
473  // Must update template for this bounding box??
474  if (updated_bbox[b].templateStatus)
475  {
476  std::cout << curFrame << " -> update " << b << std::endl;
477 
478  // There are usually two approaches here:
479  // 1. Redefine the bounding box using a feature detector such as
480  // \ref algo_harris_corners "Harris keypoint detector", or
481  // 2. Use updated_bbox[b], which is still valid, although tracking
482  // errors might accumulate over time.
483  //
484  // We'll go to the second option, less robust, but simple enough
485  // to implement.
486  bboxes[b] = updated_bbox[b];
487 
488  // Signal the input that the template for this bounding box must be updated.
489  bboxes[b].templateStatus = 1;
490 
491  // Predicted transform is now identity as we reset the tracking.
492  preds[b] = VPIHomographyTransform2D{};
493  preds[b].mat3[0][0] = 1;
494  preds[b].mat3[1][1] = 1;
495  preds[b].mat3[2][2] = 1;
496  }
497  else
498  {
499  // Inform that the template for this bounding box doesn't need to be pdated.
500  bboxes[b].templateStatus = 0;
501 
502  // We just update the input transform with the estimated one.
503  preds[b] = estim[b];
504  }
505  }
506 
507  // We're finished working with the output arrays.
508  CHECK_STATUS(vpiArrayUnlock(outputBoxList));
509  CHECK_STATUS(vpiArrayUnlock(outputEstimList));
510 
511  // Since we've updated the input arrays, tell VPI to invalidate
512  // any internal buffers that might still refer to the old data.
513  CHECK_STATUS(vpiArrayInvalidate(inputBoxList));
514  CHECK_STATUS(vpiArrayInvalidate(inputPredList));
515 
516  // Next's reference frame is current's template.
517  std::swap(imgTemplate, imgReference);
518  std::swap(cvTemplate, cvReference);
519  } while (true);
520  }
521  catch (std::exception &e)
522  {
523  std::cerr << e.what() << std::endl;
524  retval = 1;
525  }
526 
527  vpiStreamDestroy(stream);
528  vpiPayloadDestroy(klt);
529  vpiArrayDestroy(inputBoxList);
530  vpiArrayDestroy(inputPredList);
531  vpiArrayDestroy(outputBoxList);
532  vpiArrayDestroy(outputEstimList);
533  vpiImageDestroy(imgReference);
534  vpiImageDestroy(imgTemplate);
535 
536  return retval;
537 }
538 
539 // vim: ts=8:sw=4:sts=4:et:ai
Functions and structures for dealing with VPI arrays.
Functions and structures for dealing with VPI images.
Declares functions that implement the KLT Feature Tracker algorithm.
Functions for handling OpenCV interoperability with VPI.
Declaration of VPI status codes handling functions.
Declares functions dealing with VPI streams.
int32_t * sizePointer
Points to the number of elements in the array.
Definition: Array.h:125
int32_t capacity
Maximum number of elements that the array can hold.
Definition: Array.h:126
VPIArrayType type
Type of each array element.
Definition: Array.h:123
void * data
Points to the first element of the array.
Definition: Array.h:128
VPIStatus vpiArraySetSize(VPIArray array, int32_t size)
Set the array size in elements.
VPIStatus vpiArrayUnlock(VPIArray array)
Releases the lock on array object.
VPIStatus vpiArrayLock(VPIArray array, VPILockMode mode, VPIArrayData *arrayData)
Acquires the lock on array object and returns a pointer to array data.
void vpiArrayDestroy(VPIArray array)
Destroy an array instance.
VPIStatus vpiArrayCreate(int32_t capacity, VPIArrayType type, uint32_t flags, VPIArray *array)
Create an empty array instance.
struct VPIArrayImpl * VPIArray
A handle to an array.
Definition: Types.h:191
VPIStatus vpiArrayInvalidate(VPIArray array)
Informs that the array's wrapped memory was updated outside VPI.
@ VPI_ARRAY_TYPE_KLT_TRACKED_BOUNDING_BOX
VPIKLTTrackedBoundingBox element.
Definition: ArrayType.h:78
@ VPI_ARRAY_TYPE_HOMOGRAPHY_TRANSFORM_2D
VPIHomographyTransform2D element.
Definition: ArrayType.h:77
Stores information about array characteristics and content.
Definition: Array.h:119
VPIStatus vpiArrayCreateHostMemWrapper(const VPIArrayData *arrayData, uint32_t flags, VPIArray *array)
Create an array object by wrapping an existing host memory block.
VPIImageFormat
Pre-defined image formats.
Definition: ImageFormat.h:99
@ VPI_IMAGE_FORMAT_U16
Single plane with one 16-bit unsigned integer channel.
Definition: ImageFormat.h:110
@ VPI_IMAGE_FORMAT_U8
Single plane with one 8-bit unsigned integer channel.
Definition: ImageFormat.h:104
@ VPI_IMAGE_FORMAT_S16
Single plane with one 16-bit signed integer channel.
Definition: ImageFormat.h:113
@ VPI_IMAGE_FORMAT_S8
Single plane with one 8-bit signed integer channel.
Definition: ImageFormat.h:107
int32_t height
Height of this plane in pixels.
Definition: Image.h:138
int32_t width
Width of this plane in pixels.
Definition: Image.h:137
void * data
Pointer to the first row of this plane.
Definition: Image.h:147
int32_t pitchBytes
Difference in bytes of beginning of one row and the beginning of the previous.
Definition: Image.h:139
VPIImagePlane planes[VPI_MAX_PLANE_COUNT]
Data of all image planes.
Definition: Image.h:166
VPIImageFormat format
Image format.
Definition: Image.h:160
VPIStatus vpiImageLock(VPIImage img, VPILockMode mode, VPIImageData *hostData)
Acquires the lock on an image object and returns a pointer to the image planes.
void vpiImageDestroy(VPIImage img)
Destroy an image instance.
struct VPIImageImpl * VPIImage
A handle to an image.
Definition: Types.h:215
VPIStatus vpiImageGetFormat(VPIImage img, VPIImageFormat *format)
Get the image format.
VPIStatus vpiImageUnlock(VPIImage img)
Releases the lock on an image object.
Stores information about image characteristics and content.
Definition: Image.h:159
int8_t templateStatus
Status of the template related to this bounding box.
Definition: Types.h:351
int8_t trackingStatus
Tracking status of this bounding box.
Definition: Types.h:344
VPIBoundingBox bbox
Bounding box being tracked.
Definition: Types.h:337
VPIStatus vpiSubmitKLTFeatureTracker(VPIStream stream, uint32_t backend, VPIPayload payload, VPIImage templateImage, VPIArray inputBoxList, VPIArray inputPredictionList, VPIImage referenceImage, VPIArray outputBoxList, VPIArray outputEstimationList, const VPIKLTFeatureTrackerParams *params)
Runs KLT Feature Tracker on two frames.
VPIStatus vpiCreateKLTFeatureTracker(uint32_t backends, int32_t imageWidth, int32_t imageHeight, VPIImageFormat imageFormat, const VPIKLTFeatureTrackerCreationParams *params, VPIPayload *payload)
Creates payload for vpiSubmitKLTFeatureTracker.
VPIStatus vpiInitKLTFeatureTrackerParams(VPIKLTFeatureTrackerParams *params)
Initialize VPIKLTFeatureTrackerParams with default values.
Structure that defines the parameters for vpiCreateKLTFeatureTracker.
Stores a bounding box that is being tracked by KLT Tracker.
Definition: Types.h:335
VPIStatus vpiImageSetWrappedOpenCVMat(VPIImage img, const cv::Mat &mat)
Redefines the wrapped cv::Mat of an existing VPIImage wrapper.
VPIStatus vpiImageCreateOpenCVMatWrapper(const cv::Mat &mat, VPIImageFormat fmt, uint32_t flags, VPIImage *img)
Wraps a cv::Mat in an VPIImage with the given image format.
struct VPIPayloadImpl * VPIPayload
A handle to an algorithm payload.
Definition: Types.h:227
void vpiPayloadDestroy(VPIPayload payload)
Deallocates the payload object and all associated resources.
struct VPIStreamImpl * VPIStream
A handle to a stream.
Definition: Types.h:209
VPIStatus vpiStreamSync(VPIStream stream)
Blocks the calling thread until all submitted commands in this stream queue are done (queue is empty)...
VPIBackend
VPI Backend types.
Definition: Types.h:91
void vpiStreamDestroy(VPIStream stream)
Destroy a stream instance and deallocate all HW resources.
VPIStatus vpiStreamCreate(uint32_t flags, VPIStream *stream)
Create a stream instance.
@ VPI_BACKEND_CUDA
CUDA backend.
Definition: Types.h:93
@ VPI_BACKEND_PVA
PVA backend.
Definition: Types.h:94
@ VPI_BACKEND_CPU
CPU backend.
Definition: Types.h:92
float width
Bounding box width.
Definition: Types.h:326
float height
Bounding box height.
Definition: Types.h:327
VPIHomographyTransform2D xform
Defines the bounding box top left corner and its homography.
Definition: Types.h:325
float mat3[3][3]
3x3 homogeneous matrix that defines the homography.
Definition: Types.h:305
@ VPI_LOCK_READ_WRITE
Lock memory for reading and writing.
Definition: Types.h:397
@ VPI_LOCK_READ
Lock memory only for reading.
Definition: Types.h:383
Stores a generic 2D homography transform.
Definition: Types.h:304