Orientated FAST and rBRIEF (ORB) [1] is a feature detection and description algorithm. It detects features across an input pyramid as well as a descriptor for each feature, returning the coordinates for each feature as well as its associated bitstring descriptor. The advantage of ORB over other detection/description algorithm such as SIFT [2] is its relative simplicity and computational efficiency. This advantage is important in real-time video processing and machine learning pipelines. The main disadvantage of ORB is its lack of robustness regarding describing keypoints of images that change in angle and scale. ORB is designed to handle such changes, but it is not as effective as algorithms such as SIFT/SURF [3] in this regard. Note that the patent on SIFT expired, but SURF is still a patented algorithm/method [4]
This example shows an input image on the left with its corresponding feature locations on the right.
Input
Output
Implementation
The ORB algorithm pulls out features from each level of the input pyramid. An input gaussian pyramid is utilized to generate features at multiple different scales in the image.
At each level, the algorithm begins by utilizing the VPI implementation of FAST Corner Detection to detect a parameterized number of FAST corners from the image. After this, it assignes a cornerness score to each FAST corner detected. If the score parameter is set to HARRIS_SCORE, the algorithm calculates the Harris cornerness strength of each detected corner as its respective score. If not, the algorithm does not assign a score. The corners are then sorted by their cornerness scores and the top algorithm parameter VPIORBParams::maxFeatures divided by the number of pyramid levels corners are kept.
After the corners are extracted from each level of the pyramid, the algorithm goes on to calculate a rBRIEF descriptor for each keypoint. The first step in calculating this descriptor is finding the local orientation of a keypoint. This is done by finding the angle between the keypoint and the intensity centroid of a patch surrounding the keypoint. The intensity centroid of a patch is defined as \(m_10/m_00\) , \(m_01/m_00\), where m_pq is defined as \(x^p * y^q * I(x, y)\) for each point \((x, y)\) in the patch. Considering this, we can define the orientation angle as \(atan2(m_01, m_10)\).
After the orientation of each keypoint is determined, the descriptor must be generated. The descriptor is generated by taking 256 binary tests on a patch surrounding a keypoint and combining their results into a 256 bit long bitstring. A binary test is defined as such: two pixels on an image patch are compared by intensity, and if the first has a greater intensity than the second, a value of 1 is returned; if not, a value of 0 is returned. The locations of these pixels for the tests is predetermined by a pattern that minimizes correlation and increases variance. The location pattern is rotated by the orientation angle before the tests are done. This allows for the descriptors to be rotationally invariant. In the current VPI CUDA implementation, keypoints close to the border do not have descriptor values and all bits are set to zero. These points are not automatically sorted out and if desired, need to be removed.
C API functions
For list of limitations, constraints and backends that implements the algorithm, consult reference documentation of the following functions:
Create the ORB payload. The input width/height/scale should match that of the input pyramid (the width and height corresponding to the first layer of the pyramid). The capacity is the maximum number of FAST corners initially detected at each level of the image pyramid. These are then filtered down to the top N best corners. A higher capacity typically corresponds to larger runtimes but more accurate keypoints.
Create the output array that will store the keypoints with the ORB corners. The output array capacity controls the maximum corners to be found. In this example, the output array capacity is set to \( 1000 \).
Submit the algorithm and its parameters to the stream. It'll be executed by the CPU backend. In this example, the border zero is used to treat pixels outside of the border as having an intensity of 0.
Destroy a stream instance and deallocate all HW resources.
For more information, see ORB feature in the "C API Reference" section of VPI - Vision Programming Interface.
Performance
Performance benchmarks will be added at a later time.
References
Rublee, E., Rabaud, V., Konolige, K., & Bradski, G. (2011, November). ORB: An efficient alternative to SIFT or SURF. In 2011 International conference on computer vision (pp. 2564-2571). Ieee.
Lowe, D. G. (1999, September). Object recognition from local scale-invariant features. In Proceedings of the seventh IEEE international conference on computer vision (Vol. 2, pp. 1150-1157). Ieee.
Bay, H., Tuytelaars, T., & Van Gool, L. (2006). Surf: Speeded up robust features. In European conference on computer vision (pp. 404-417). Springer, Berlin, Heidelberg.
Funayama, R., Yanagihara, H., Van Gool, L., Tuytelaars, T., & Bay, H. (2012). U.S. Patent No. 8,165,401. Washington, DC: U.S. Patent and Trademark Office.
Generated by NVIDIA | Sat Jan 14 2023 08:49:24 | cff3b823cbc8daefd751d796475a83be5c5bd082