Vision Programming

The NvMedia VPI component provides a set of APIs to run certain Computer Vision (CV) algorithms on the NVIDIA® Tegra® PVA (Programmable Vision Accelerator).

Sequence of Tasks

The following diagram illustrates the sequence of tasks that applications should follow while running CV algorithms when using NvMedia VPI.

NvMedia VPI Sequence

The following sections describe of sequence of tasks the above flow diagram.

Setup

Allocate, configure, and set up all resources before attempting any other VPI calls.

1. Create Instance

Configure and create an instance of a specified PVA hardware engine using NvmediaVPICreate(). For the specific information on the NvMedia VPI API, see Vision Programming Interface in the API Reference for this PDK.

2. Register Buffers

Register all buffers that will be used with a particular instance. Registration API's are NvMediaVPIImageRegister, NvMediaVPIArrayRegister, NvMediaVPIPyramidRegister.

This is required only once per buffer per instance.

3. Create Descriptor

Certain algorithms require a descriptor to be created before the algorithm can be run. This is required only once per set of descriptor parameters per instance.

For example, NvMediaVPICreateProcessStereoPairDescriptor.

Run-Time

Run the CV algorithm. For information about the algorithms, see Algorithm APIs in this topic.

1. Queue

Every time an algorithm is to be run for an instance, it must be queued. Every algorithm has its own queue API.

For example, NvMediaVPIProcessStereoPairDesc, NvMediaVPIConvolveImage

2. Flush

After operations are queued, they must be flushed before they can run on the hardware engine using NvMediaVPIFlush. This applies to single or multiple operations.

This is a non-block call. Application can choose to block, waiting for an operation on a particular buffer to complete. To do so, use one of the following functions, as applicable:

• NvMediaImageGetStatus

• NvMediaArrayGetStatus

• NvMediaImagePyramidGetStatus

Destroy

Free all resources.

1. Destroy Descriptor

Destroy all created descriptors using NvMediaVPIDestroyDescriptor

2. Unregister Buffers

Unregister all registered buffers using NvMediaVPIImageUnregister or NvMediaVPIPyramidUnregister.

3. Destroy Instance

Destroy hardware engine instance using NvMediaVPIDestroy to free all resources.

Algorithm APIs

This section describes the algorithms you can call during runtime to perform CV processing.

Box Filter

The Box Filter algorithm blurs an image using windowWidth×windowHeight normalized averaging kernel.

Blur kernel is defined by

Filter anchor point is center of the kernel.

• Queues the task.

NvMediaStatus

NvMediaVPIFilterImageBox(

NvMediaVPI *vpi,

NvMediaImage *input,

const uint32_t windowWidth,

const uint32_t windowHeight,

NvMediaImage *output

);

For more information about this function, see Vision Programming Interface in the API Reference.

The output image dimension and format must match the input image dimension and format.

Limitations and Constraints:

Currently, the only supported window size is:

windowWidth = windowHeight = 3

Layout and Format

For information on the supported layouts and formats, see:

• Pitch Linear and Block Linear Layout Support in this topic.

• Supported Surface Format Packing in this topic.

• Supported Resolutions in this topic.

Harris Corner Detector and Tracker

The Harris Corner Detector algorithm detects Harris corners on an input image [4].

The standard Harris corner detection process is applied first. After that, a non-max suppression pruning process is applied, during which a preference is given to tracked keypoints.

The user may mask image coordinates for corner detection and tracking.

Corner Detection Process

• Computes the spatial gradient, using a separable gradient filter. The user may choose between default Sobel filters and user-defined separable gradient filter coefficients.

Gradient filter size is 3x3, 5x5 or 7x7.

• For default Sobel filters (gradientFilterType is NVMEDIA_VPI_HARRIS_FILTER_TYPE_DEFAULT):

• For gradientSize = 3:

• For gradientSize = 5:

• For gradientSize = 7:

• For a user-defined filter (gradientFilterType is NVMEDIA_VPI_HARRIS_FILTER_TYPE_USER), the following constrains apply:

The verticalFilterCoefficient array constrains:

• Array size: gradientSize

• verticalFilterCoefficient[0] = verticalFilterCoefficient[gradientSize - 1] = 1

• Symmetric coefficient

The horizontalFilterCoefficient array constrains:

• Array size: gradientSize

• verticalFilterCoefficient[0] = -1

• verticalFilterCoefficient[gradientSize/2] = 0

• Anti-symetric coefficient

• Computes a gradients covariance matrix (structure tensor) for each pixel within a block window, as described by:

Where:

• p is a pixel coordinate for B, a block window of size 3x3, 5x5, or 7x7.

• Computes a Harris response score using a sensitivity factor

Where k is the sensitivity factor.

• Applies a threshold-strength criterion.

Non-Max Suppression Pruning Process

This process performs non-max suppression on all detected corners, which removes multiple or spurious responses.

2D Grid Cell-Based Non-Maximum Suppression

This process splits the input image into a 2D cell grid. It selects a single corner with the highest response score inside the cell. If several corners within the cell have the same response score, it selects the bottom-right corner.

If tracked keypoints are provided, any tracked keypoint inside a cell suppresses all other detected keypoints inside that cell. The tracked keypoint and the corresponding score are directly copied into output keypoints and scores arrays, respectively.Important Functions

For complete description of the following functions and structs, see Vision Programming Interface in the API Reference for this PDK.

• Creates a Harris points descriptor.

NvMediaVPIGetKeyPointsHarrisDescriptor *

NvMediaVPICreateGetKeyPointsHarrisDescriptor(

NvMediaVPI *vpi,

const NvMediaSurfaceType type

);

• Queues the task.

NvMediaStatus

NvMediaVPIGetKeyPointsHarrisDesc(

NvMediaVPI *vpi,

NvMediaVPIGetKeyPointsHarrisDescriptor *descriptor,

NvMediaImage *input,

const NvMediaVPIGetKeyPointsHarrisParams *params,

NvMediaImage *mask,

NvMediaArray *trackedKeypoints,

NvMediaArray *trackedScores,

NvMediaArray *keypoints,

NvMediaArray *scores

);

Important Structures

• Defines type of gradient filter.

typedef enum {

/*! Built-in normalized Sobel filter */

NVMEDIA_VPI_HARRIS _FILTER_TYPE_DEFAULT,

/*! User supplied gradient filter. The filter is automatically normalized */

NVMEDIA_VPI_HARRIS_FILTER_TYPE_USER,

} NvMediaVPIHarrisGradientFilterType;

• Holds task parameters.

typedef struct {

/*! Gradient window size. Must be 3, 5 or 7. */

const uint32_t gradientSize;

/*! Filter type. */

const NvMediaVPIFilterType gradientFilterType;

/*! Horizontal filter coefficients */

const int16_t *horizontalFilterCoefficients;

/*! Vertical filter coefficients */

const uint16_t *verticalFilterCoefficients;

/*! Block window size used to compute the Harris Corner score. Must be 3, 5 or 7. */

const uint32_t blockSize;

/*! Specifies the minimum threshold with which to eliminate Harris Corner scores */

const float_t strengthThresh;

/*! Specifies sensitivity threshold from the Harris-Stephens equation. */

const float_t sensitivity;

/*! Specifies the post-process non-maximum suppression type. */

const NvMediaVPINonMaxSuppressionType nonMaxSuppressionType;

/*! Specifies the radial Euclidean distance for non-maximum suppression */

const float_t minDistance;

} NvMediaVPIGetKeyPointsHarrisParams;

Limitations and Constraints

• Maximum number of tracked keypoints plus detected key points is 8192. Maximum number of tracked key points and scores is 2048. Beyond these limits, the algorithm drops keypoints.

• trackedScores holds the scores of trackedKeypoints. Must be the same size.

• mask points to a surface with identical size as input surface size, or is NULL.

• If nonMaxSuppressionType is set to NVMEDIA_VPI_2D_GRID_8x8_CELL_NO_MIN_DISTANCE, the minDistance value is ignored.

• Currently, this implementation supports only the detection of new keypoints. Input keypoints and scores are ignored

Layout and Format

For information on the supported layouts and formats, see:

• Pitch Linear and Block Linear Layout Support in this topic.

• Supported Surface Format Packing in this topic.

• Supported Resolutions in this topic.

Convert Motion Vectors

The Convert Motion Vectors algorithm applies a noise reduction filter and up-samples the smoothed motion vector image. It consumes a noisy motion vector image.

• Filtering operation is robust to outliers.

• Up-sampling operation uses a bilinear interpolation with a fixed scale factor of 4.

The motion vector surface format type is interleaved signed 16 bits for X and Y.

Important Functions

For complete description of the following functions and structs, see Vision Programming Interface in the API Reference for this PDK.

• Creates a descriptor.

NvMediaVPIConvertMVDescriptor *

NvMediaVPICreateConvertMVDescriptor(

NvMediaVPI *vpi,

const uint32_t width,

const uint32_t height,

const NvMediaSurfaceType type,

const float strength,

const float scale

);

• Queues a task.

NvMediaStatus

NvMediaVPIConvertMVDesc(

NvMediaVPI *vpi,

NvMediaVPIConvertMVDescriptor *descriptor,

NvMediaImage *inputMVImage,

NvMediaImage *inputColor,

NvMediaImage *output,

const NvMediaVPIOFSTOutputType outputType

);

Limitations and Constraints

This algorithm currently has the following limitations and constraints:

• Supports only these configurations:

• output size matches input size

• output size is x4 input size.

• Ignores the strength parameter.

• Ignores the inputColor parameter.

• Ignores NvMediaVPIOFSTOutputType. By default, the NvMediaVPIConvertMVDesc function generates NVMEDIA_VPI_MV only.

Layout and Format

For information on the supported layouts and formats, see:

• Pitch Linear and Block Linear Layout Support in this topic.

• Supported Surface Format Packing in this topic.

• Supported Resolutions in this topic.

Stereo Preprocess

Given a pair of rectified images from a stereo camera, the Stereo Preprocess algorithm uses high-quality dense stereo matching to produce an output image of the same resolution as the input.

You may use the outputType in the task descriptor to use this algorithm as a standalone left and right disparity computation. Alternatively, you may compute hints for the NvEnc engine as part of a more complex pipeline that may also include the Stereo Postprocess stage.

Task Output Flavors

For the standalone disparity computation, the algorithm produces a left image-based and a right image-based disparity map on output0 and output1, respectively. It provides disparity values as fixed-point integrals with 5 fraction bits:

float32_t disparity = output[i]/32.0f;

For the hint generation computation, output0 consists of the hints map and output1 consists of a confidence map. That confidence map must be provided as an input to the Stereo Postprocess stage.

For this mode, user defined parameters control the confidence map following this process:

delta = leftDisparity[i] - rightDisparity[i + leftDisparity[i]];

key = delta >> disparityDiffToConfidence;

if (key > NVPVA_STEREO_CONF_GEN_MAX_CONF_TABLE_LEN) {

confidence[i] = 0;

} else {

confidence[i] = confidenceTable[key];

}

You must also set the NVENC hints density by setting the numberOfHintsPerMB parameter to a value between 1 and 8. The higher the number of hints, the more accuracy the NvEnc engine produces.

The output image contents depend on the user parameter of the NvMediaVPIOFSTOutputType enum type. Supported values are:

• NVMEDIA_PVA_STEREO_DISPARITY: Computes left and right disparity maps.

• NVMEDIA_VPI_MV_HINTS: A proprietary format used as input to the NVENC engine.

• NVMEDIA_VPI_MV: Not supported.

The element NvMediaVPIStereoPreprocessParams::confidenceThresh filters out disparity coordinate hints that have poor confidence.

Important Functions

For complete descriptions of the following functions and structs, see Vision Programming Interface in the API Reference.

Layout and Format

For information on the supported layouts and formats, see:

• Pitch Linear and Block Linear Layout Support in this topic

• Supported Surface Format Packing Types in this topic

• Supported Resolutions in this topic

Stereo Postprocess

The Stereo Postprocess algorithm applies a noise reduction filter, using the confidence map generated in the Stereo Preprocess algorithm used on the NVENC engine output disparity map. You may choose to up-sample the pruned disparity image.

The filtering operation is robust to outliers.

• The up-sampling operation uses a bilinear interpolation with a fixed scale factor of 4.

• Up-sampling is applied to both the disparity and confidence maps.

• Up-sampling is applied only if scale is set to 4.0f.

• NvMediaVPIStereoPostprocessParams::windowDimension is used to set noise reduction filter strength.

• Confidence image values smaller than the Stereo Postprocess value of e are ignored.

Important Functions

For complete descriptions of the following functions and structs, see Vision Programming Interface in the API Reference.

• To create a descriptor:

NvMediaVPIStereoPostprocessDescriptor *

NvMediaVPICreateStereoPostprocessDescriptor(

NvMediaVPI *vpi,

const uint32_t width,

const uint32_t height,

const float scale

);

• To queue a task:

NvMediaStatus

NvMediaVPIStereoPostprocessDesc(

NvMediaVPI *vpi,

NvMediaVPIStereoPostprocessDescriptor *descriptor,

NvMediaVPIStereoPostprocessParams *params,

NvMediaImage *disparityInput,

NvMediaImage *confidenceInput,

NvMediaImage *disparityOutput,

NvMediaImage *confidenceOutput,

);

Layout and Format

For information on supported layouts and formats, see:

• Pitch Linear and Block Linear Layout Support in this topic

• Supported Surface Format Packing Types in this topic

• Supported Resolutions in this topic

FAST Corners Detector

The FAST Corners Detector algorithm computes the FAST corner response and reports the strongest keypoint in each cell of a regularly spaced 2D grid. The algorithm is based on the approaches described in [1] and [2], with the modifications described below.

To classify whether the candidate point p is actually a corner, the FAST corner detector uses pixels on the Bresenham circle around a candidate point. If N contiguous pixels are brighter than the candidate point by at least a threshold value t (an input parameter strength) or darker by at least t, then the candidate point is considered to be a corner. If non-maximum suppression flag is turned on, the algorithm:

• Computes corner strengths for the detected corners.

• Uses those strengths to remove multiple or spurious corner responses.

In FAST corner detection, a candidate pixel p is detected as a corner or not based on its value and the pixel values on its radius-3 Bresenham circle (16 pixels), given the following conditions:

• C1: A set of N contiguous pixels S:

• C2: A set of N contiguous pixels S:

If either of the two conditions is true, the candidate pixel p is detected as a corner.

In this implementation, the minimum number of contiguous pixels N is set to 9 (FAST9).

The value of the intensity difference threshold strengthThresh of type uint32_t must be within:

0 < t < 65535

Where the range is defined with unsigned uint16.

Corner Strength Computation

If the nonmax_suppression input parameter is true, the algorithm computes the corner strengths (responses) for the detected corners. Otherwise, the corner strength is undefined. The corner strength (response) Cp function is defined as the largest threshold t for which the pixel p remains a corner.

Non-Maximum Suppression

If the nonmax_suppression parameter is true, the algorithm filters the detected corners by a non-maxima suppression step. The corner with coordinates (x,y) is kept if and only if the following conditions are satisfied:

http://en.wikipedia.org/wiki/Features_from_accelerated_segment_test

Important Functions

For complete description of the following functions and structs, see Vision Programming Interface in the API Reference for this PDK.

• Creates a descriptor

NvMediaVPIGetKeyPointsFastDescriptor *

NvMediaVPICreateGetKeyPointsFastDescriptor(

NvMediaVPI *vpi,

const uint32_t width,

const uint32_t height,

const NvMediaSurfaceType type

);

• Queues the task.

NvMediaStatus

NvMediaVPIGetKeyPointsFastDesc(

NvMediaVPI *vpi,

NvMediaVPIGetKeyPointsFastDescriptor *descriptor,

NvMediaImage *input,

const uint32_t strengthThresh,

const NvMediaBool nonmaxSupression,

NvMediaArray *keypoints,

NvMediaArray *scores

);

Limitations and Constraints

• The keypoint output array type is 32 bits per pixel, which is interpreted as 16-bit interleaved x,y.

• Maximal number of keypoints and scores is 2,048. Beyond 2,048 keypoints, the algorithm drops corners. To avoid dropping points of interest, adjust these fields: sensitivity, strengthThresh and nonMaxSuppressionType.

• strengthThreashold is limited to <min value> - <max value>

• Currently, this API supports only the detection of new keypoints. It ignores input keypoints and scores.

Layout and Format

For information on the supported layouts and formats, see:

• Pitch Linear and Block Linear Layout Support in this topic.

• Supported Surface Format Packing in this topic.

• Supported Resolutions in this topic.

Lucas-Kanade Feature Tracker

The Lucas-Kanade Feature Tracker algorithm computes the optic-flow of input keypoints between two image pyramids. For more information, see [3].

The function inputs are old image pyramid, an array of keypoints detected in the bottom level of the old image pyramid and a new image pyramid. It produces a new array of keypoints in the bottom level of the new image pyramid. Each keypoint in the new keypoints array has a status indicating whether the tracking failed (i.e. tracking process completed successfully IFF status is 0).

Important Functions

For complete description of the following functions, see Vision Programming Interface in the API Reference for this PDK.

• Creates a descriptor

NvMediaVPIGetSparseFlowPyrLKDescriptor *

NvMediaVPICreateGetSparseFlowPyrLKDescriptor(

NvMediaVPI *vpi,

const NvMediaVPIGetSparseFlowPyrLKParams *params

);

• Queues the task

NvMediaStatus NvMediaVPIGetSparseFlowPyrLKDesc(

NvMediaVPI *vpi,

NvMediaVPIGetSparseFlowPyrLKDescriptor *descriptor,

const NvMediaVPIGetSparseFlowPyrLKParams *params,

NvMediaImagePyramid *oldImages,

NvMediaImagePyramid *newImages,

NvMediaArray *oldPoints,

NvMediaArray *newPoints,

NvMediaArray *newStatus

);

Limitations and Constraints

The Lucas-Kanade Feature Tracker has the following limitations:

• The maximum supported windowDimension is 15 (and not 32).

• scaleFactor (from one level of pyramid to next) other than 0.5 is supported but not tested.

• scaleFactornumLevels must be greater than or equal to 2-7. (e.g. if scaleFactor = 0.5 then numLevels <= 7).

• Currently, ignores user’s defined termination condition. Uses NVMEDIA_VPI_TERMINATION_CRITERION_ITERATIONS as default. As a result, currently, user defined epsilon is not used.

• Max number of points per task is 1,024.

• Currently, user defined calcError is not used.

Layout and Format

For information on the supported layouts and formats, see:

• Pitch Linear and Block Linear Layout Support in this topic.

• Supported Surface Format Packing in this topic.

• Supported Resolutions in this topic.

Detailed Explanation

The Lucas-Kanade method finds the affine motion vector V for each key point in the old image array, using the following equation:

Where Ix and Iy are computed using the Scharr gradients on the old image:

Where I is defined as the adjacent pixels to the point p(x,y) under consideration. With a given window size of M , I has M^2 points and the pixel p(x,y) is centered in the window. It is obtained by a simple difference between the same pixels in both images.

In practice, to get an accurate solution, it is necessary to iterate multiple times on this scheme (in a Newton-Raphson fashion) until:

• The delta of the affine motion vector between two iterations is smaller than a threshold

• And/or it reaches the maximum number of iterations.

Each iteration, It is updated to be the difference between the old image and the pixel shifted with the estimated V in the new image. At each iteration, the function needs to check if the tracking point was lost. The criteria for a lost tracking are

• The matrix above is not invertible. (The determinant of the matrix is less than a threshold : 10^{-7} .)

• The minimum eigenvalue of the matrix is smaller than a threshold ( 10^{-4} ).

• The point tracked coordinate is outside the image coordinates.

Currently, clients must update the output array of keypoints before applying it as the input to the next frame. They can do that by zeroing out the whole output array and then running Harris Corner Detector to re-initialize the keypoints. For more information, see Harris Corner Detector in this topic.

Kanade-Lucas-Tomasi Object Template Tracker

The Kanade-Lucas-Tomasi (KLT) Tracker algorithm estimates the 2D translation and scale changes of an image template between original template coordinates and a given reference image using the Inverse Compositional algorithm. For more information, see [5].

Inputs are an array of template bounding boxes, a translation and scale changes predictions array and a reference image. Additionally, a template image input is used to update template patches (see details below). Outputs are the translation and scale changes estimations array from the input bounding box coordinates to the reference image coordinates and the template bounding box coordinates array in the reference image.

Refer to alternative bounding box / transformation supported varients for further details on update and retrieval functions for the bounding boxes and motion arrays.

Refer to the parameters definitions for further details on configuration and tuning.

Important Functions

For a complete description of the following functions, see Vision Programming Interface in the API Reference for this PDK.

• Create a descriptor

NvMediaVPIKLTDescriptor *

NvMediaVPICreateKLTDescriptor(

NvMediaVPI *vpi,

const uint32_t width,

const uint32_t height,

const NvMediaSurfaceType type

);

• Queue the task

NvMediaStatus

NvMediaVPIKLTDesc(

NvMediaVPI *vpi,

NvMediaVPIKLTDescriptor *descriptor,

const NvMediaVPIKLTParams *params,

NvMediaImage *referenceImage,

NvMediaImage *templateImage,

NvMediaArray *inputBoxList,

NvMediaArray *predictedBoxList,

NvMediaArray *outputBoxList,

NvMediaArray *estimationList

);

Limitations and Constraints

The Kanade Lucas Tomasi Object Template Tracker has the following limitations:

• Maximum number of bounding boxes / templates is 128.

• Bounding box size cannot exceed 64x64 pixels. In a future version, prediction and estimation bounding boxes will limited to 2x input bounding box.

• Maximum supported scale change is 0.2.

Layout and Format

For information on the supported layouts and formats, see:

• Pitch Linear and Block Linear Layout Support in this topic.

• Supported Surface Format Packing in this topic.

• Supported Resolutions in this topic.

Detailed Explanation

Each template bounding box defines a template image patch stored internally with the function descriptor. Those template patches are tracked in reference images based on predicted translation and scale changes. An estimation translation and scale changes from the original bounding box coordinates to reference image coordinates are computed. Each such estimation includes a tracking validity flag (tracking success or failure) and whether a template update is required, based on user-defined threshold parameters.

Stored template patches and template patch update

In many use cases the user may want to capture an object template patch in frame t and then estimate the translation and scale changes of that template patch in some other reference frame t+k.

To support this use case, the user indicates for each template bounding box whether to use a previously cropped and stored image patch associated with the bounding box or to crop and store a new image patch from the input template image. The cropped image patches are stored with the task descriptor.

Use Case Example

Assume object obj1 detected in framet using some object detection functionality. The obj1 bounding box in framet coordinates is denoted as bbox1t.

To estimate the translation and scale changes of obj1 in framet+1 of the input bbox1t, the update flag (updateTemplate in inputBoxList) is set (see bounding box and 2D transform detailed definitions below).

The input template image (templateImage) is framet, and the reference image (referenceImage) is framet+1.

The KLT function crops the bbox1t image patch from template image, stores the cropped patch with the function descriptor, and computes the image patch estimated translation and scale changes in the reference image. Assume obj1 translation and scale changes estimation status (trackingStatus flag in outputBoxList) between framet and framet+1 was valid and the function indicates no template update is required (updateTemplate flag in outputBoxList). Further assume a second object obj2 was also detected in framet+1 with a bounding box bbox2t+1.

For the next frame, framet+2, the template image is framet+1 and the reference image is framet+2. The input box list now consists of bbox1t with an unset template update flag and bbox2t+1 with a set template update flag. For obj1 the tracker uses the already stored image patch, thus effectively estimating cumulative translation and scale changes between framet and framet+2 For obj2, first bbox2t+1 is cropped and stored. Then the translation and scale changes are estimated for that image patch between framet+1 and framet+2.

Assume that for both obj1 and obj2 tracking status is valid. Further assume that while the obj2 template update is unset, the obj1 template update indicates that a new template is needed. (See the detailed parameters definition below for an explanation of what might trigger a template update recommendation.) One possible approach to the template update for obj1 is to use the latest successfully tracked bounding box. For obj1 that would be the tracked bounding box provided in framet+2 coordinates from the output bounding box list (outputBoxList), denoted as bbox1t+2.

From the next frame, the template image is framet+2 and the reference image is framet+3. The input box list consists of bbox1t+2 and bbox2t+1 for obj1 and obj2 respectively. The bbox1t+2 template update flag is set, triggering the tracker to crop the bbox1t+2 image patch from the template image (from framet+2) and replace the existing image patch (from framet).

Bounding Boxes and 2D Transformation Alternatives

The KLT functionality supports two flavors of bounding boxes: (1) an axis-aligned bounding box (AABB) with a dedicated translation and scale struct, and (2) a more generalized 2D 3x3 homogenous transformation with object width and height.

For both variants, scale change is assumed with respect to the center of the bounding box. Both types of bounding boxes also include a trackingStatus flag indicating the tracking validity (1 for invalid) and a templateStatus flag indicating that the template needs an update (1 indicates update).

Axis-Aligned Bounding Box (AABB) and Translation with Scale

An AABB takes this format:

• float x and float y: bounding box top-left corner

• float width and float height: object width and height

A translation with scale takes this format:

• float deltaScale: scale change with respect to the center of the bounding box

• float deltaX and float deltaY: the translation, defined as {∆x,∆y}

Generalized 2D 3x3 Homogenous Transformation with Width and Height

The 3x3 transform takes the following format:

The bounding box with transform takes the following format:

• NvMediaVPI2DTransform transform: a 3x3 transform as described above

• float width: the object width in pixels

• float height: the object height in pixels

KLT config and tuning parameters NvMediaVPIKLTParams

• uint32_t numIterationsScaling: Maximum number of gradient decent iterations

• uint32_t numIterationsTranslation: Not used; reserved

• uint32_t numIterationsCoarse: Not used; reserved

• uint32_t numIterationsFine: Not used; reserved

• float maxPixelTolerance: Not used; reserved

• float thresholdKill: If after numIterationScaling steps the gradient decent estimation error is greater than thresholdKill, trackingStatus is set to invalid and the template tracking fails.

• float thresholdUpdate: If after numIterationScaling steps the gradient decent estimation error is greater than thresholdUpdate but less than or equal to thresholdKill, templateStatus is set to require a template update.

• float thresholdStop: If after any iteration the gradient decent estimation error is less than thresholdStop, tracking is complete. No further iterations ar4e attempted and both trackingStatus and templateStatus are unset, indicating a valid tracking with no need for template update.

thresholdStop ≤ thresholdUpdate ≤ thresholdKill

• float maxScaleChange: If scale change is greater than maxScaleChange, trackingStatus is set to invalid. Maximum value for maxScaleChange is 0.2f.

2D Convolution

The 2D Convolution algorithm performs a 2D convolution operation on the input image with the provided kernel.

The functionality supports both 2D kernels and separable 1D kernels.

Important Functions

For complete description of the following functions and structs, see Vision Programming Interface in the API Reference for this PDK.

• Queues the task to convolve an input image with client supplied convolution matrix

NvMediaStatus

NvMediaVPIConvolveImage(

NvMediaVPI *vpi,

NvMediaImage *input,

const float_t *kernelData,

const uint32_t kernelWidth,

const uint32_t kernelHeight,

NvMediaImage *output

);

• Creates a descriptor for Separable Convolution.

NvMediaVPIConvolveImageSeparableDescriptor *

NvMediaVPICreateConvolveImageSeparableDescriptor(

NvMediaVPI *vpi,

NvMediaSurfaceType type,

const float_t *kernelX,

const uint32_t kernelXSize,

const float_t *kernelY,

const uint32_t kernelYSize

);

• Queues Task for Separable Convolution.

NvMediaStatus

NvMediaVPIConvolveImageSeparable(

NvMediaVPI *vpi,

NvMediaImage *input,

const float_t *kernelX,

const uint32_t kernelXSize,

const float_t *kernelY,

const uint32_t kernelYSize,

NvMediaImage *output

);

Limitations and Constraints

• Max kernel size is 11x11

• The kernel must be normalized such that each element has an absolute value of < 1.0

Layout and Format

For information on the supported layouts and formats, see:

• Pitch Linear and Block Linear Layout Support in this topic.

• Supported Surface Format Packing in this topic.

• Supported Resolutions in this topic.

Get Image Pyramid

The Get Image Pyramid algorithm computes the image pyramid.

For complete description of the following function, see Vision Programming Interface in the API Reference for this PDK.

• Queues the task.

NvMediaStatus

NvMediaVPIGetImagePyramid(

NvMediaVPI *vpi,

NvMediaImage *input,

NvMediaImagePyramid *output

);

Limitations and Constraints

• Max number of levels is 10.

• The effective number of levels in an image pyramid will also depend on input image size. The image size of the highest level will be greater than or equal to the smallest supported image size. For related information, see Supported Resolutions in this topic.

Layout and Format

For information on the supported layouts and formats, see:

• Pitch Linear and Block Linear Layout Support in this topic.

• Supported Surface Format Packing in this topic.

• Supported Resolutions in this topic.

Pitch Linear and Block Linear Layout Support

Pitch linear is supported by all functionalities for both input and output surfaces

Block linear is partially supported, depending on whether used for input or output. The following table provides more details.

Algorithm	Input Supported	Output Supported
Box Filter	Yes	Yes
Harris Corner Detector	Yes	N/A
Convert Motion Vector	Yes	-
Fast Corner Detector	Yes	Yes
KLT Tracker	Yes	N/A
Lucas-Kanade Feature Tracker	Yes	N/A
2D Convolution	Yes	Yes
Build Image Pyramid	Yes	Yes
StereoPreProcess	-	-
StereoPostProcess	Yes	Yes

Supported Surface Format Packing Types

The following table describes VPI support for NvMedia surface format packing types. Support depends on the VPI algorithm. Surface types not listed in this table are not supported.

Algorithm	Surf Type, Comp Order	Supported Surface Format Packing Types
Algorithm	Surf Type, Comp Order	uint8	int8	uint16	int16	uint32	int32
Box Filter	RGBA, Alpha	Yes	Yes	Yes	Yes	Yes	Yes
Harris Corner Detector	RGBA, Alpha	-	-	-	Yes	-	-
Convert Motion Vector	RGBA, RG	-	-	-	Yes	-	-
Fast Corner Detector	RGBA, Alpha	-	-	Yes	-	-	-
KLT Tracker	RGBA, Alpha	-	-	Yes	-	-	-
Lucas-Kanade Feature Tracker	RGBA, Alpha	-	-	Yes	-	-	-
2D Convolution	RGBA, Alpha	Yes	Yes	Yes	Yes	Yes	Yes
Build Image Pyramid	RGBA, Alpha	-	-	Yes	-	-	-
StereoPreProcess	RGBA, Alpha	-	-	Yes	-	Hints output only	-
StereoPostProcess	RGBA, Alpha	-	-	Yes -Confidence	Yes - Disparity	-	-

Supported Resolutions

Algorithm	Minimum Image Resolution	Maximum Image Resolution
Convert Motion Vector (NvMediaVPIConvertMVDesc)	31 x 17	3264 x 2448
Harris Corner Detector (NvMediaVPIGetKeyPointsHarrisDesc)	65 x 65	3264 x 2448
Fast Corner Detector (NvMediaVPIGetKeyPointsFastDesc)	65 x 65	3264 x 2448
Lucas-Kanade Feature Tracker (NvMediaVPIGetSparseFlowPyrLKDesc)	17 x 17	3264 x 2448
Build Image Pyramid (NvMediaVPIGetImagePyramid)	65 x 65	3264 x 2448
2D Convolution (NvMediaVPIConvolveImage)	65 x 33	3264 x 2448
2D Convolution with separable filter (NvMediaVPIConvolveImageSeparableDesc)	65 x 33	3264 x 2448
Box Filter (NvMediaVPIFilterImageBox)	65 x 33	3264 x 2448
StereoPreProcess (NvMediaVPIStereoPreprocessDescEx)	480 x 270	480 x 270
StereoPostProcess (NvMediaVPIStereoPostprocessDesc)	Input: 480 x 270 Output: 480 x 270 only	Input: 480 x 270 Output: 1920 x 1080 only

When an output image is in BL format, its width must be a multiple of 64 pixels.

References

[1] Edward Rosten and Tom Drummond. “Machine learning for high-speed corner detection.” European Conference on Computer Vision, volume 1, pages 430-443, May 2006.

[2] Edward Rosten, Reid Porter, and Tom Drummond. “Faster and better: A machine learning approach to corner detection.” IEEE Trans. Pattern Analysis and Machine Intelligence, 32:105-119, October 2010.

[3] Jean-Yves Bouguet. Pyramidal Implementation of the Lucas Kanade Feature Tracker Description of the Algorithm, 2000.

[4] C. Harris, M. Stephens, “A Combined Corner and Edge Detector,” Proc. Alvey Vision Conf., pp. 147-151, 1988.

[5] Simon Baker, Iain Matthew, “Lucas-Kanade 20 Years On: A Unified Framework,” International Journal of Computer Vision, February 2004, Volume 56, issue 3, pp 221-255.