VPI - Vision Programming Interface

3.0 Release

ORB feature detector

Overview

Oriented FAST and rBRIEF (ORB) [1] is a feature detection and description algorithm. It detects features, or corners, across an input pyramid and computes a descriptor for each feature, returning the coordinates for each feature in the highest resolution base image of the input, as well as its associated bitstring descriptor. The advantage of ORB over other detection and description algorithms, such as SIFT [2], is its relative simplicity and computational efficiency. This advantage is important in real-time video processing and machine learning pipelines. The main disadvantage of ORB is its lack of robustness describing features on images that change in angle and scale. Although ORB is designed to handle such cases, it is not as effective as other algorithms, such as SIFT or SURF [3] in this aspect.

The example below shows an input image on the left with its corresponding feature locations on the right.

Input Output

Implementation

The ORB algorithm calculates features, or corners, on each level of the input pyramid using FAST algorithm. The input is normally a Gaussian pyramid, that allows to generate multiple corners at different scales of the base image.

For each level of the input pyramid, the ORB algorithm runs FAST to detect potentially a large number of corners. Afterwards, ORB assigns a cornerness score to each FAST corner detected. One way to assign scores is via the HARRIS algorithm, using the Harris response score with a 3x3 block window and sensitivy factor equal 1. The cornerness score is then used to sort all FAST corners from highest to lowest score value and filter the top N detected corners by ORB, where N is potentially much smaller than the total number of corners found by FAST. Another way to assign scores is via FAST itself, effectivelly skipping the Harris response score assignment and sorting, trading quality of corners detected by ORB for performance.

The corners detected by ORB on each level of the input pyramid are gathered in a single output array. Corners in lower levels are rescaled back to the highest input level. Corners in the final, lowest resolution levels may be discarded if the maximum capacity of the output array is reached.

ORB calculates a descriptor called rBRIEF for each of the corners detected. This is done in the highest, base level of the input pyramid. The first step to calculate the rBRIEF descriptor of a corner is to compute its local orientation. This is done by calculating the angle between the corner and the intensity centroid of a patch surrounding the corner. The intensity centroid of a patch is defined as \(m_{10}/m_{00}\) , \(m_{01}/m_{00}\), where \(m\_{pq}\) is defined as \(x^p * y^q * I(x, y)\) for each point \((x,~y)\) in the patch. Considering this, we can define the orientation angle as \(atan2(m_{01},~m_{10})\).

After the orientation of each corner is determined, the descriptor must be generated. The descriptor is generated by doing 256 binary tests on a patch surrounding a corner and combining their results into a 256 bit string. Each binary test is defined as: two pixels on a patch are compared by intensity, and if the first has greater intensity than the second, a value of 1 is set; if not, a value of 0 is set. The pixel locations for these tests are determined by a pattern that minimizes correlation and increases variance. The location pattern is rotated by the orientation angle before the tests are done. This ensures the descriptors are rotationally invariant.

C API functions

For list of limitations, constraints and backends that implements the algorithm, consult reference documentation of the following functions:

Function Description
vpiInitORBParams Initializes VPIORBParams with default values.
vpiCreateORBFeatureDetector Creates an ORB feature detector payload.
vpiSubmitORBFeatureDetector Submits an ORB feature detector operation to the stream.
vpiCreateORBDescriptorExtractor Creates an ORB descriptor extractor payload.
vpiSubmitORBDescriptorExtractor Submits an ORB descriptor extractor operation to the stream.

Usage

Language:
  1. Import VPI module
    import vpi
  2. Read an input image, convert it to grayscale and construct a Gaussian pyramid from it, using the CPU backend.
    input = vpi.asimage(np.asarray(Image.open(args.input))) \
    .convert(vpi.Format.U8, backend=vpi.Backend.CPU) \
    .gaussian_pyramid(4, backend=vpi.Backend.CPU)
  3. Run ORB corner detector algorithm on the input pyramid using the CPU backend. The FAST intensity threshold is set to 142 to be more selective in the corners found in this example. Also, the maximum number of features per level is set to 88 and maximum input pyramid levels to use is 3.
    with vpi.Backend.CPU:
    corners, descriptors = input.orb(intensity_threshold=142, max_features_per_level=88, max_pyr_levels=3)
  4. Optionally, retrieve all corner positions found and their descriptors as numpy array in the CPU memory.
    a_corners = corners.cpu()
    a_descriptors = descriptors.cpu()
  1. Initialization phase
    1. Include the header that defines the ORB algorithm functions and parameter structure.
      #include <vpi/algo/ORB.h>
      Declares functions that implement support for ORB.
    2. Create the ORB parameter object, initially setting it to the default parameters. The FAST intensity threshold is set to 142 to be more selective in the corners found in this example. Also, the maximum number of features per level is set to 88 and maximum input pyramid levels to use is 3.
      VPIORBParams params;
      vpiInitORBParams(&params);
      params.maxFeaturesPerLevel = 88;
      params.maxPyramidLevels = 3;
      float intensityThreshold
      Threshold to select a pixel as being part of the arc in circle around a keypoint candidate.
      Definition: FASTCorners.h:112
      int32_t maxFeaturesPerLevel
      The maximum number N of features per level of the input pyramid to be used by ORB.
      Definition: ORB.h:106
      VPIFASTCornerDetectorParams fastParams
      Parameters for the FAST corner detector, see FAST Corners Detector for more details.
      Definition: ORB.h:94
      int32_t maxPyramidLevels
      Maximum number of levels in the input pyramid to utilize.
      Definition: ORB.h:111
      VPIStatus vpiInitORBParams(VPIORBParams *params)
      Initializes VPIORBParams with default values.
      Structure that defines the parameters for vpiSubmitORBFeatureDetector.
      Definition: ORB.h:89
    3. Create the ORB payload. The capacity is the maximum number of FAST corners that can be detected at each level of the input pyramid. A higher capacity typically corresponds to slower run times but provides ORB with more corners to filter. This internal buffer capacity is set to 20 times the maximum number of features per level.
      VPIPayload payload;
      int bufCapacity = params.maxFeaturesPerLevel * 20;
      VPIStatus vpiCreateORBFeatureDetector(uint64_t backends, int32_t capacity, VPIPayload *payload)
      Creates an ORB feature detector payload.
      struct VPIPayloadImpl * VPIPayload
      A handle to an algorithm payload.
      Definition: Types.h:268
      @ VPI_BACKEND_CPU
      CPU backend.
      Definition: Types.h:92
    4. Create the stream where the algorithm will be submitted for execution.
      VPIStream stream;
      vpiStreamCreate(0, &stream);
      struct VPIStreamImpl * VPIStream
      A handle to a stream.
      Definition: Types.h:250
      VPIStatus vpiStreamCreate(uint64_t flags, VPIStream *stream)
      Create a stream instance.
    5. Define the input pyramid object, please see Gaussian pyramid for details.
      VPIImage input;
      LoadImage(sIn, VPI_IMAGE_FORMAT_U8, &input);
      int width, height;
      vpiImageGetSize(input, &width, &height);
      VPIPyramid inputPyr;
      vpiPyramidCreate(width, height, VPI_IMAGE_FORMAT_U8, 4, 0.5, VPI_BACKEND_CPU, &inputPyr);
      #define VPI_IMAGE_FORMAT_U8
      Single plane with one 8-bit unsigned integer channel.
      Definition: ImageFormat.h:100
      VPIStatus vpiSubmitGaussianPyramidGenerator(VPIStream stream, uint64_t backend, VPIImage input, VPIPyramid output, VPIBorderExtension border)
      Computes the Gaussian pyramid from the input image.
      struct VPIImageImpl * VPIImage
      A handle to an image.
      Definition: Types.h:256
      VPIStatus vpiImageGetSize(VPIImage img, int32_t *width, int32_t *height)
      Get the image dimensions in pixels.
      VPIStatus vpiPyramidCreate(int32_t width, int32_t height, VPIImageFormat fmt, int32_t numLevels, float scale, uint64_t flags, VPIPyramid *pyr)
      Create an empty image pyramid instance with the specified flags.
      struct VPIPyramidImpl * VPIPyramid
      A handle to an image pyramid.
      Definition: Types.h:262
      @ VPI_BORDER_CLAMP
      Border pixels are repeated indefinitely.
      Definition: Types.h:279
    6. Create the output array that will store the ORB corners. The output array capacity controls the maximum number of corners to be detected by ORB in all levels. This output capacity is set to the maximum number of features per level times maximum pyramid levels to use.
      VPIArray corners;
      int outCapacity = params.maxFeaturesPerLevel * params.maxPyramidLevels;
      vpiArrayCreate(outCapacity, VPI_ARRAY_TYPE_KEYPOINT_F32, 0, &corners);
      VPIStatus vpiArrayCreate(int32_t capacity, VPIArrayType type, uint64_t flags, VPIArray *array)
      Create an empty array instance.
      struct VPIArrayImpl * VPIArray
      A handle to an array.
      Definition: Types.h:232
      @ VPI_ARRAY_TYPE_KEYPOINT_F32
      VPIKeypointF32 element.
      Definition: ArrayType.h:77
    7. Create the output array that will store the ORB descriptors. It is one descriptor for each ORB corner detected. Its capacity is the same as the output corner array capacity.
      VPIArray descriptors;
      vpiArrayCreate(outCapacity, VPI_ARRAY_TYPE_BRIEF_DESCRIPTOR, 0, &descriptors);
      @ VPI_ARRAY_TYPE_BRIEF_DESCRIPTOR
      VPIBriefDescriptor element.
      Definition: ArrayType.h:84
  2. Processing phase
    1. Submit the algorithm and its parameters to the stream. It'll be executed by the CPU backend. The border limited is used to ignore pixels near image boundary.
      VPI_CHECK_STATUS(vpiSubmitORBFeatureDetector(stream, VPI_BACKEND_CPU, payload, inputPyr, corners, descriptors,
      &params, VPI_BORDER_LIMITED));
      VPIStatus vpiSubmitORBFeatureDetector(VPIStream stream, uint64_t backend, VPIPayload payload, VPIPyramid input, VPIArray outCorners, VPIArray outDescriptors, const VPIORBParams *params, VPIBorderExtension border)
      Submits an ORB feature detector operation to the stream.
      @ VPI_BORDER_LIMITED
      Consider image as limited to not access outside pixels.
      Definition: Types.h:282
    2. Optionally, wait until the processing is done.
      vpiStreamSync(stream);
      VPIStatus vpiStreamSync(VPIStream stream)
      Blocks the calling thread until all submitted commands in this stream queue are done (queue is empty)...
  3. Cleanup phase
    1. Free resources held by the stream, the input image, output arrays and payload.
      vpiArrayDestroy(corners);
      vpiArrayDestroy(descriptors);
      void vpiArrayDestroy(VPIArray array)
      Destroy an array instance.
      void vpiImageDestroy(VPIImage img)
      Destroy an image instance.
      void vpiPayloadDestroy(VPIPayload payload)
      Deallocates the payload object and all associated resources.
      void vpiStreamDestroy(VPIStream stream)
      Destroy a stream instance and deallocate all HW resources.

For more information, please see ORB features in the "C API Reference" section of VPI - Vision Programming Interface.

Performance

Performance benchmarks will be added at a later time.

References

  1. Rublee, E., Rabaud, V., Konolige, K., & Bradski, G. (2011, November). ORB: An efficient alternative to SIFT or SURF. In 2011 International conference on computer vision (pp. 2564-2571). Ieee.
  2. Lowe, D. G. (1999, September). Object recognition from local scale-invariant features. In Proceedings of the seventh IEEE international conference on computer vision (Vol. 2, pp. 1150-1157). Ieee.
  3. Bay, H., Tuytelaars, T., & Van Gool, L. (2006). Surf: Speeded up robust features. In European conference on computer vision (pp. 404-417). Springer, Berlin, Heidelberg.