VPI - Vision Programming Interface

3.2 Release

AprilTag Detector and Pose Estimator

Overview

AprilTags are fiducial markers designed for visual detection and localization. VPI supports detection and pose estimation of a number of common tag families, and may be used as a drop-in replacement for the reference detection algorithm.

For more information, refer to the landing page for AprilTags at April Robotics.

Input Detection output

Implementation

Detector

Implementation of the detector is divided into the following stages:

Stage Description Backends
Decimation Input is decimated by a factor of 2 PVA/CPU
Adaptive thresholding Decide whether pixels are black or white in the decimated image PVA/CPU
Connected component labeling Assign unique IDs to contiguous regions of black/white pixels PVA/CPU
Gradient cluster generation Cluster all the coordinates which lie on the edge between a given black/white region pair PVA/CPU
Quad fitting Attempt to fit quadrilaterals to each gradient cluster CPU
Decoding Attempt to decode each quadrilateral as an AprilTag CPU

When choosing VPI_BACKEND_PVA, VPI will use PVA for the stages which support this backend and fall back to CPU for the remainder. The VPIStream and input/output buffers should be created with support for both PVA and CPU.

Pose estimator

In general, there are eight mathematical solutions to 3D pose estimation from homography decomposition [1]. Of these eight, only two of these are valid subject to the constraints that (1) the camera is in front of the tag and (2) the camera is facing the tag.

The pose estimator attempts to estimate both of these pose candidates via SVD followed by a fixed number of iterations which minimize corner reprojection error. If both pose candidates are valid, the estimator returns the pose which results in the least corner reprojection error.

C API functions

For list of limitations, constraints and backends that implements the algorithm, consult reference documentation of the following functions:

Function Description
vpiInitAprilTagDecodeParams Initializes VPIAprilTagDecodeParams with default values.
vpiCreateAprilTagDetector Creates a AprilTag detector payload.
vpiSubmitAprilTagDetector Submits a AprilTag detector operation to the stream.
vpiSubmitAprilTagPoseEstimation Submits a AprilTag pose estimation operation to the stream.

Usage

Language:
  1. Initialization phase:
    1. Include the header that defines AprilTags functions:
      Declares functions that implement AprilTag detection and pose estimation.
    2. Define the required input image:
      VPIImage input = /*...*/;
      struct VPIImageImpl * VPIImage
      A handle to an image.
      Definition: Types.h:256
    3. Define the payload to process input image with provided width and height:
      int32_t w, h;
      vpiImageGetSize(input, &w, &h);
      VPIPayload payload;
      const int maxHamming = 1;
      VPIAprilTagDecodeParams params = {NULL, 0, maxHamming, family};
      vpiCreateAprilTagDetector(VPI_BACKEND_CPU, w, h, &params, &payload);
      VPIStatus vpiCreateAprilTagDetector(uint64_t backends, int32_t inputWidth, int32_t inputHeight, VPIAprilTagDecodeParams const *params, VPIPayload *payload)
      Creates a AprilTag detector payload.
      VPIAprilTagFamily
      Specify the family of AprilTags to detect.
      Definition: AprilTags.h:84
      @ VPI_APRILTAG_36H11
      36h11 family
      Definition: AprilTags.h:94
      Decode parameters for vpiCreateAprilTagDetector.
      Definition: AprilTags.h:112
      VPIStatus vpiImageGetSize(VPIImage img, int32_t *width, int32_t *height)
      Get the image dimensions in pixels.
      struct VPIPayloadImpl * VPIPayload
      A handle to an algorithm payload.
      Definition: Types.h:268
      @ VPI_BACKEND_CPU
      CPU backend.
      Definition: Types.h:92
    4. Define the output arrays:
      const int maxDetections = 64;
      VPIArray detections;
      VPIArray poses;
      VPIStatus vpiArrayCreate(int32_t capacity, VPIArrayType type, uint64_t flags, VPIArray *array)
      Create an empty array instance.
      struct VPIArrayImpl * VPIArray
      A handle to an array.
      Definition: Types.h:232
      @ VPI_ARRAY_TYPE_POSE
      VPIPose element.
      Definition: ArrayType.h:89
      @ VPI_ARRAY_TYPE_APRILTAG_DETECTION
      VPIAprilTagDetection element.
      Definition: ArrayType.h:88
    5. Create the stream to which the algorithm is to be submitted for execution:
      VPIStream stream;
      vpiStreamCreate(0, &stream);
      struct VPIStreamImpl * VPIStream
      A handle to a stream.
      Definition: Types.h:250
      VPIStatus vpiStreamCreate(uint64_t flags, VPIStream *stream)
      Create a stream instance.
  2. Processing phase:
    1. Submit the detector:
      vpiSubmitAprilTagDetector(stream, VPI_BACKEND_CPU, payload, maxDetections, input, detections);
      VPIStatus vpiSubmitAprilTagDetector(VPIStream stream, uint64_t backend, VPIPayload payload, uint32_t maxDetections, VPIImage input, VPIArray outDetections)
      Submits a AprilTag detector operation to the stream.
    2. Submit the pose estimator:
      const VPICameraIntrinsic intrinsics = {{w / 3.5f, 0.0f, w / 2.f}, {0.0f, h / 3.6f, h / 2.f}};
      const float tagSize = 0.2f;
      vpiSubmitAprilTagPoseEstimation(stream, VPI_BACKEND_CPU, detections, intrinsics, tagSize, poses);
      VPIStatus vpiSubmitAprilTagPoseEstimation(VPIStream stream, uint64_t backend, VPIArray inDetections, const VPICameraIntrinsic intrinsics, float tagSize, VPIArray outPoses)
      Submits a AprilTag pose estimation operation to the stream.
      float VPICameraIntrinsic[2][3]
      Camera intrinsic matrix.
      Definition: Types.h:655
    3. Wait until the processing is done:
      vpiStreamSync(stream);
      VPIStatus vpiStreamSync(VPIStream stream)
      Blocks the calling thread until all submitted commands in this stream queue are done (queue is empty)...
  3. Cleanup phase:
    1. Free resources held by the stream, the payload, the input image and the output arrays:
      vpiArrayDestroy(detections);
      void vpiArrayDestroy(VPIArray array)
      Destroy an array instance.
      void vpiImageDestroy(VPIImage img)
      Destroy an image instance.
      void vpiPayloadDestroy(VPIPayload payload)
      Deallocates the payload object and all associated resources.
      void vpiStreamDestroy(VPIStream stream)
      Destroy a stream instance and deallocate all HW resources.

Performance

For information on how to use the performance tables below, see Algorithm Performance Tables.
Before comparing measurements, consult Comparing Algorithm Elapsed Times.
For further information on how performance was benchmarked, see Performance Benchmark.

 - 

References

  1. Malis, E. and Vargas, M. "Deeper understanding of the homography decomposition for vision-based control"
    Research Report 6303, INRIA 2007