Overview

This application tracks bounding boxes on an input video, draws the frames on each frame and saves them to disk. The user can define what backend will be used for processing.
Note: The output will be in grayscale as the algorithm currently only supports grayscale images.
This sample shows the following VPI features:
Creating and destroying a VPI device.
Wrapping an image hosted on CPU (input frame) to be used by VPI.
Wrapping an array hosted on CPU (input bounding boxes) to be used by VPI.
Creating a VPI-managed 2D image where output will be written to.
Use the multi-frame KLT Bounding Box Tracker algorithm.
Simple device synchronization.
Array locking to access its contents from CPU side.
Error handling.
Environment clean up using user-defined context.
Instructions

The usage is:
./vpi_sample_06_klt_tracker <backend> <input video> <input bboxes> <output frames>
where
backend: either cpu, cuda or pva; it defines the backend that will perform the processing.
input video: input video file name, it accepts all video types that OpenCV's cv::VideoCapture accepts.
input bboxes: file with input bounding boxes and in what frame they appear. The file is composed of multiple lines with the following format:
```
   <frame> <bbox_x> <bbox_y> <bbox_width> <bbox_height>
```
It's important that the lines are sorted with frames in ascending order.
output frames: the file name that will be used for the output frames. Example: output.png will generate frames output_0000.png, output_0001.png, output_0002.png, and so on.
Here's one example:
./vpi_sample_06_klt_tracker cuda ../assets/dashcam.mp4 ../assets/dashcam_bboxes.txt frame.png
This is using the CUDA backend and one of the provided sample videos and bounding boxes.
Source code

For convenience, here's the code that is also installed in the samples directory.
/*
* Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*  * Redistributions of source code must retain the above copyright
*    notice, this list of conditions and the following disclaimer.
*  * Redistributions in binary form must reproduce the above copyright
*    notice, this list of conditions and the following disclaimer in the
*    documentation and/or other materials provided with the distribution.
*  * Neither the name of NVIDIA CORPORATION nor the names of its
*    contributors may be used to endorse or promote products derived
*    from this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
* EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
* PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT OWNER OR
* CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
* EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
* PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
* PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
* OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
 
#include <opencv2/core/version.hpp>
#if CV_MAJOR_VERSION >= 3
#    include <opencv2/imgcodecs.hpp>
#    include <opencv2/videoio.hpp>
#else
#    include <opencv2/highgui/highgui.hpp>
#endif
 
#include <opencv2/imgproc/imgproc.hpp>
 
#include <vpi/Array.h>
#include <vpi/Context.h>
#include <vpi/Image.h>
#include <vpi/Stream.h>
#include <vpi/algo/KLTBoundingBoxTracker.h>
 
#include <cstring> // for memset
#include <fstream>
#include <iostream>
#include <map>
#include <vector>
 
#define CHECK_STATUS(STMT)                                      \
    do                                                          \
    {                                                           \
        VPIStatus status = (STMT);                              \
        if (status != VPI_SUCCESS)                              \
        {                                                       \
            throw std::runtime_error(vpiStatusGetName(status)); \
        }                                                       \
    } while (0);
 
// Utility function to wrap a cv::Mat into a VPIImage
static VPIImage ToVPIImage(const cv::Mat &frame)
{
    VPIImageData imgData;
    memset(&imgData, 0, sizeof(imgData));
 
    switch (frame.type())
    {
    case CV_16U:
        imgData.type = VPI_IMAGE_TYPE_U16;
        break;
    case CV_8U:
        imgData.type = VPI_IMAGE_TYPE_U8;
        break;
    default:
        throw std::runtime_error("Frame type not supported");
    }
 
    // First fill VPIImageData with the, well, image data...
    imgData.numPlanes           = 1;
    imgData.planes[0].width     = frame.cols;
    imgData.planes[0].height    = frame.rows;
    imgData.planes[0].rowStride = frame.step[0];
    imgData.planes[0].data      = frame.data;
 
    // Now create a VPIImage that wraps it.
    VPIImage img;
    CHECK_STATUS(vpiImageWrapHostMem(&imgData, 0, &img));
    return img;
};
 
// Utility to draw the bounding boxes into an image and save it to disk.
static void SaveKLTBoxes(VPIImage img, VPIArray boxes, VPIArray preds, const std::string &filename, int frame)
{
    // Convert img into a cv::Mat
    cv::Mat out;
    {
        VPIImageData imgdata;
        CHECK_STATUS(vpiImageLock(img, VPI_LOCK_READ, &imgdata));
 
        int cvtype;
        switch (imgdata.type)
        {
        case VPI_IMAGE_TYPE_U8:
            cvtype = CV_8U;
            break;
 
        case VPI_IMAGE_TYPE_S8:
            cvtype = CV_8S;
            break;
 
        case VPI_IMAGE_TYPE_U16:
            cvtype = CV_16UC1;
            break;
 
        case VPI_IMAGE_TYPE_S16:
            cvtype = CV_16SC1;
            break;
 
        default:
            throw std::runtime_error("Image type not supported");
        }
 
        cv::Mat cvimg(imgdata.planes[0].height, imgdata.planes[0].width, cvtype, imgdata.planes[0].data,
                      imgdata.planes[0].rowStride);
 
        if (cvimg.type() == CV_16U)
        {
            cvimg.convertTo(out, CV_8U);
            cvimg = out;
            out   = cv::Mat();
        }
 
        cvtColor(cvimg, out, cv::COLOR_GRAY2BGR);
 
        CHECK_STATUS(vpiImageUnlock(img));
    }
 
    // Now draw the bounding boxes.
    VPIArrayData boxdata;
    CHECK_STATUS(vpiArrayLock(boxes, VPI_LOCK_READ, &boxdata));
 
    VPIArrayData preddata;
    CHECK_STATUS(vpiArrayLock(preds, VPI_LOCK_READ, &preddata));
 
    auto *pboxes = reinterpret_cast<VPIKLTTrackedBoundingBox *>(boxdata.data);
    auto *ppreds = reinterpret_cast<VPIHomographyTransform2D *>(preddata.data);
 
    srand(0);
    for (size_t i = 0; i < boxdata.size; ++i)
    {
        if (pboxes[i].trackingStatus == 1)
        {
            // So that the colors assigned to bounding boxes don't change
            // when some bbox isn't tracked anymore.
            rand();
            rand();
            rand();
            continue;
        }
 
        float x, y, w, h;
        x = pboxes[i].bbox.xform.mat3[0][2] + ppreds[i].mat3[0][2];
        y = pboxes[i].bbox.xform.mat3[1][2] + ppreds[i].mat3[1][2];
        w = pboxes[i].bbox.width * pboxes[i].bbox.xform.mat3[0][0] * ppreds[i].mat3[0][0];
        h = pboxes[i].bbox.height * pboxes[i].bbox.xform.mat3[1][1] * ppreds[i].mat3[1][1];
 
        rectangle(out, cv::Rect(x, y, w, h), cv::Scalar(rand() % 256, rand() % 256, rand() % 256), 2);
    }
 
    CHECK_STATUS(vpiArrayUnlock(preds));
    CHECK_STATUS(vpiArrayUnlock(boxes));
 
    // Create the output file name
    std::string fname = filename;
    int ext           = fname.rfind('.');
 
    char buffer[512] = {};
    snprintf(buffer, sizeof(buffer) - 1, "%s_%04d%s", fname.substr(0, ext).c_str(), frame, fname.substr(ext).c_str());
 
    // Finally, write frame to disk
    if (!imwrite(buffer, out, {cv::IMWRITE_JPEG_QUALITY, 70}))
    {
        throw std::runtime_error("Can't write to " + std::string(buffer));
    }
}
 
int main(int argc, char *argv[])
{
    // We'll create all our objects under this context, so that
    // we don't have to track what objects to destroy. Just destroying
    // the context will destroy all objects.
    VPIContext ctx = nullptr;
 
    int retval = 0;
 
    try
    {
        if (argc != 5)
        {
            throw std::runtime_error(std::string("Usage: ") + argv[0] +
                                     " <cpu|pva|cuda> <input_video> <bbox descr> <output>");
        }
 
        std::string strDevType     = argv[1];
        std::string strInputVideo  = argv[2];
        std::string strInputBBoxes = argv[3];
        std::string strOutputFiles = argv[4];
 
        // Load the input video
        cv::VideoCapture invid;
        if (!invid.open(strInputVideo))
        {
            throw std::runtime_error("Can't open '" + strInputVideo + "'");
        }
 
        // Create our context.
        CHECK_STATUS(vpiContextCreate(0, &ctx));
 
        // Activate it. From now on all created objects will be owned by it.
        CHECK_STATUS(vpiContextSetCurrent(ctx));
 
        // Load the bounding boxes
        // Format is: <frame number> <bbox_x> <bbox_y> <bbox_width> <bbox_height>
        // Important assumption: bboxes must be sorted with increasing frame numbers.
 
        // Arrays that will store our input bboxes and predicted transform.
        VPIArray inputBoxList, inputPredList;
 
        // These arrays will actually wrap these vectors.
        std::vector<VPIKLTTrackedBoundingBox> bboxes;
        std::vector<VPIHomographyTransform2D> preds;
 
        // Stores how many bboxes there are in each frame. Only
        // stores when the bboxes count change.
        std::map<int, size_t> bboxes_size_at_frame; // frame -> bbox count
 
        // PVA requires that array capacity is 128.
        bboxes.reserve(128);
        preds.reserve(128);
 
        // Read bounding boxes
        {
            std::ifstream in(strInputBBoxes);
            if (!in)
            {
                throw std::runtime_error("Can't open '" + strInputBBoxes + "'");
            }
 
            // For each bounding box,
            int frame, x, y, w, h;
            while (in >> frame >> x >> y >> w >> h)
            {
                if (bboxes.size() == 64)
                {
                    throw std::runtime_error("Too many bounding boxes");
                }
 
                // Convert the axis-aligned bounding box into our tracking
                // structure.
 
                VPIKLTTrackedBoundingBox track = {};
                // scale
                track.bbox.xform.mat3[0][0] = 1;
                track.bbox.xform.mat3[1][1] = 1;
                // position
                track.bbox.xform.mat3[0][2] = x;
                track.bbox.xform.mat3[1][2] = y;
                // must be 1
                track.bbox.xform.mat3[2][2] = 1;
 
                track.bbox.width     = w;
                track.bbox.height    = h;
                track.trackingStatus = 0; // valid tracking
                track.templateStatus = 1; // must update
 
                bboxes.push_back(track);
 
                // Identity predicted transform.
                VPIHomographyTransform2D xform = {};
                xform.mat3[0][0]               = 1;
                xform.mat3[1][1]               = 1;
                xform.mat3[2][2]               = 1;
                preds.push_back(xform);
 
                bboxes_size_at_frame[frame] = bboxes.size();
            }
 
            if (!in && !in.eof())
            {
                throw std::runtime_error("Can't parse bounding boxes, stopped at bbox #" +
                                         std::to_string(bboxes.size()));
            }
 
            // Wrap the input arrays into VPIArray's
            VPIArrayData data = {};
            data.type         = VPI_ARRAY_TYPE_KLT_TRACKED_BOUNDING_BOX;
            data.capacity     = bboxes.capacity();
            data.size         = 0;
            data.data         = &bboxes[0];
            CHECK_STATUS(vpiArrayWrapHostMem(&data, 0, &inputBoxList));
 
            data.type = VPI_ARRAY_TYPE_HOMOGRAPHY_TRANSFORM_2D;
            data.data = &preds[0];
            CHECK_STATUS(vpiArrayWrapHostMem(&data, 0, &inputPredList));
        }
 
        // Now process the device type
        VPIDeviceType devType;
 
        if (strDevType == "cpu")
        {
            devType = VPI_DEVICE_TYPE_CPU;
        }
        else if (strDevType == "cuda")
        {
            devType = VPI_DEVICE_TYPE_CUDA;
        }
        else if (strDevType == "pva")
        {
            devType = VPI_DEVICE_TYPE_PVA;
        }
        else
        {
            throw std::runtime_error("Backend '" + strDevType +
                                     "' not recognized, it must be either cpu, cuda or pva.");
        }
 
        // Create the stream for the given backend.
        VPIStream stream;
        CHECK_STATUS(vpiStreamCreate(devType, &stream));
 
        // Helper function to fetch a frame from input
        int nextFrame   = 0;
        auto fetchFrame = [&invid, &nextFrame, devType]() {
            cv::Mat frame;
            if (!invid.read(frame))
            {
                return cv::Mat();
            }
 
            // We only support grayscale inputs
            if (frame.channels() == 3)
            {
                cvtColor(frame, frame, cv::COLOR_BGR2GRAY);
            }
 
            if (devType == VPI_DEVICE_TYPE_PVA)
            {
                // PVA only supports 16-bit unsigned inputs,
                // where each element is in 0-255 range, so
                // no rescaling needed.
                cv::Mat aux;
                frame.convertTo(aux, CV_16U);
                frame = aux;
            }
            else
            {
                assert(frame.type() == CV_8U);
            }
 
            ++nextFrame;
            return frame;
        };
 
        // Fetch the first frame and wrap it into a VPIImage.
        // Templates will be based on this frame.
        cv::Mat cvTemplate   = fetchFrame(), cvReference;
        VPIImage imgTemplate = ToVPIImage(cvTemplate);
 
        VPIImageType imgType;
        CHECK_STATUS(vpiImageGetType(imgTemplate, &imgType));
 
        // Using this first frame's characteristics, create a KLT Bounding Box Tracker payload
        VPIPayload klt;
        CHECK_STATUS(vpiCreateKLTBoundingBoxTracker(stream, cvTemplate.cols, cvTemplate.rows, imgType, &klt));
 
        // Parameters we'll use. No need to change them on the fly, so just define them here.
        VPIKLTBoundingBoxTrackerParams params = {};
        params.numberOfIterationsScaling      = 20;
        params.nccThresholdUpdate             = 0.8f;
        params.nccThresholdKill               = 0.6f;
        params.nccThresholdStop               = 1.0f;
        params.maxScaleChange                 = 0.2f;
        params.maxTranslationChange           = 1.5f;
        params.trackingType                   = VPI_KLT_INVERSE_COMPOSITIONAL;
 
        // Output array with estimated bbox for current frame.
        VPIArray outputBoxList;
        CHECK_STATUS(vpiArrayCreate(128, VPI_ARRAY_TYPE_KLT_TRACKED_BOUNDING_BOX, 0, &outputBoxList));
 
        // Output array with estimated transform of input bbox to match output bbox.
        VPIArray outputEstimList;
        CHECK_STATUS(vpiArrayCreate(128, VPI_ARRAY_TYPE_HOMOGRAPHY_TRANSFORM_2D, 0, &outputEstimList));
 
        // Reference (current) frame.
        VPIImage imgReference = nullptr;
 
        size_t curNumBoxes = 0;
 
        do
        {
            size_t curFrame = nextFrame - 1;
 
            // Get the number of bounding boxes in current frame.
            auto tmp          = --bboxes_size_at_frame.upper_bound(curFrame);
            size_t bbox_count = tmp->second;
 
            assert(bbox_count >= curNumBoxes && "input bounding boxes must be sorted by frame");
 
            // Does current frame have new bounding boxes?
            if (curNumBoxes != bbox_count)
            {
                // Update the input array sizes, the new frame is already there as we populated
                // these arrays with all input bounding boxes.
                CHECK_STATUS(vpiArrayLock(inputBoxList, VPI_LOCK_READ_WRITE, nullptr));
                CHECK_STATUS(vpiArraySetSize(inputBoxList, bbox_count));
                CHECK_STATUS(vpiArrayUnlock(inputBoxList));
 
                CHECK_STATUS(vpiArrayLock(inputPredList, VPI_LOCK_READ_WRITE, nullptr));
                CHECK_STATUS(vpiArraySetSize(inputPredList, bbox_count));
                CHECK_STATUS(vpiArrayUnlock(inputPredList));
 
                for (size_t i = 0; i < bbox_count - curNumBoxes; ++i)
                {
                    std::cout << curFrame << " -> new " << curNumBoxes + i << std::endl;
                }
                assert(bbox_count <= bboxes.capacity());
                assert(bbox_count <= preds.capacity());
 
                curNumBoxes = bbox_count;
            }
 
            // Save this frame to disk.
            SaveKLTBoxes(imgTemplate, inputBoxList, inputPredList, strOutputFiles, curFrame);
 
            // Fetch a new frame
            vpiImageDestroy(imgReference);
            cvReference = fetchFrame();
 
            // Video ended?
            if (cvReference.data == nullptr)
            {
                // Just end gracefully.
                break;
            }
 
            // Wrap frame into a VPIImage
            imgReference = ToVPIImage(cvReference);
 
            // Estimate the bounding boxes in current frame (reference) given their position in previous
            // frame (template).
            CHECK_STATUS(vpiSubmitKLTBoundingBoxTracker(klt, imgTemplate, inputBoxList, inputPredList, imgReference,
                                                        outputBoxList, outputEstimList, &params));
 
            // Wait for processing to finish.
            CHECK_STATUS(vpiStreamSync(stream));
 
            // Now we lock the output arrays to properly set up the input for the next iteration.
            VPIArrayData updatedBBoxData;
            CHECK_STATUS(vpiArrayLock(outputBoxList, VPI_LOCK_READ, &updatedBBoxData));
 
            VPIArrayData estimData;
            CHECK_STATUS(vpiArrayLock(outputEstimList, VPI_LOCK_READ, &estimData));
 
            auto *updated_bbox = reinterpret_cast<VPIKLTTrackedBoundingBox *>(updatedBBoxData.data);
            auto *estim        = reinterpret_cast<VPIHomographyTransform2D *>(estimData.data);
 
            // For each bounding box,
            for (size_t b = 0; b < curNumBoxes; ++b)
            {
                // Did tracking failed?
                if (updated_bbox[b].trackingStatus)
                {
                    // Do we have to update the input bbox's tracking status too?
                    if (bboxes[b].trackingStatus == 0)
                    {
                        std::cout << curFrame << " -> dropped " << b << std::endl;
                        bboxes[b].trackingStatus = 1;
                    }
 
                    continue;
                }
 
                // Must update template for this bounding box??
                if (updated_bbox[b].templateStatus)
                {
                    std::cout << curFrame << " -> update " << b << std::endl;
 
                    // There are usually two approaches here:
                    // 1. Redefine the bounding box using a feature detector such as
                    //    \ref algo_harris_keypoints "Harris keypoint detector", or
                    // 2. Use updated_bbox[b], which is still valid, although tracking
                    //    errors might accumulate over time.
                    //
                    // We'll go to the second option, less robust, but simple enough
                    // to implement.
                    bboxes[b] = updated_bbox[b];
 
                    // Signal the input that the template for this bounding box must be updated.
                    bboxes[b].templateStatus = 1;
 
                    // Predicted transform is now identity as we reset the tracking.
                    preds[b]            = VPIHomographyTransform2D{};
                    preds[b].mat3[0][0] = 1;
                    preds[b].mat3[1][1] = 1;
                    preds[b].mat3[2][2] = 1;
                }
                else
                {
                    // Inform that the template for this bounding box doesn't need to be pdated.
                    bboxes[b].templateStatus = 0;
 
                    // We just update the input transform with the estimated one.
                    preds[b] = estim[b];
                }
            }
 
            // We're finished working with the output arrays.
            CHECK_STATUS(vpiArrayUnlock(outputBoxList));
            CHECK_STATUS(vpiArrayUnlock(outputEstimList));
 
            // Since we've updated the input arrays, tell VPI to invalidate
            // any internal buffers that might still refer to the old data.
            CHECK_STATUS(vpiArrayInvalidate(inputBoxList));
            CHECK_STATUS(vpiArrayInvalidate(inputPredList));
 
            // Next's reference frame is current's template.
            std::swap(imgTemplate, imgReference);
            std::swap(cvTemplate, cvReference);
        } while (true);
    }
    catch (std::exception &e)
    {
        std::cerr << e.what() << std::endl;
        retval = 1;
    }
 
    // Clean up
    vpiContextDestroy(ctx);
 
    return retval;
}
Results

Frame 0445	Frame 0465
<
VPI - Vision Programming Interface

0.2.0 Release

Overview

Instructions

Source code

Results