Scene awareness is a fundamental skill for robotic manipulators to operate in unconstrained environments. This ability includes locating objects and their poses, also known as the 6-DoF pose estimation problem. Accurate, realtime pose information of nearby objects in the scene allows robots to engage in semantic interaction.

CenterPose is a single-stage, keypoint-based approach for category-level object pose estimation, which operates on unknown object instances within a known category using a single RGB image input. The proposed network performs 2D object detection, detects 2D keypoints, estimates 6-DoF pose, and regresses relative 3D bounding cuboid dimensions.

The following are two category results that show the 3D bounding box, object pose, and the relative cuboid dimensions. The y-axis is up, which aligned with gravity (green line). The x-axis follows the right hand rule (red line). The front face is defined as z-axis (blue line). Because CenterPose is a category-level object pose estimation method, it needs to provide different models for testing different categories.

Bottle Sample


Shoes Sample


The training algorithm employs penalty-reduced focal losses in a point-wise manner for the center point and keypoint heatmaps, respectively. It also minimizes the center sub-pixel offset loss, keypoint sub-pixel offset loss, and the relative-scale loss. This model was trained using the CenterPose training app in TAO Toolkit v5.2.

Previous Visual ChangeNet-Segmentation
Next PeopleNet
© Copyright 2024, NVIDIA. Last updated on Mar 22, 2024.