Scene awareness is a fundamental skill for robotic manipulators to operate in unconstrained environments. This ability includes locating objects and their poses, also known as the 6-DoF pose estimation problem. Accurate, realtime pose information of nearby objects in the scene allows robots to engage in semantic interaction.
CenterPose is a single-stage, keypoint-based approach for category-level object pose estimation, which operates on unknown object instances within a known category using a single RGB image input. The proposed network performs 2D object detection, detects 2D keypoints, estimates 6-DoF pose, and regresses relative 3D bounding cuboid dimensions.
The following are two category results that show the 3D bounding box, object pose, and the relative cuboid dimensions. The y-axis is up, which aligned with gravity (green line). The x-axis follows the right hand rule (red line). The front face is defined as z-axis (blue line). Because CenterPose is a category-level object pose estimation method, it needs to provide different models for testing different categories.
Bottle Sample
![bottle.png](https://docscontent.nvidia.com/dims4/default/fbfcaaf/2147483647/strip/true/crop/600x800+0+0/resize/600x800!/quality/90/?url=https%3A%2F%2Fk3-prod-nvidia-docs.s3.us-west-2.amazonaws.com%2Fbrightspot%2Fsphinx%2F0000018f-ab8a-d3a1-a79f-ffef2ae80000%2Ftao%2Ftao-toolkit%2F_images%2Fbottle.png)
Shoes Sample
![shoes.png](https://docscontent.nvidia.com/dims4/default/c3e33f4/2147483647/strip/true/crop/600x800+0+0/resize/600x800!/quality/90/?url=https%3A%2F%2Fk3-prod-nvidia-docs.s3.us-west-2.amazonaws.com%2Fbrightspot%2Fsphinx%2F0000018f-ab8a-d3a1-a79f-ffef2ae80000%2Ftao%2Ftao-toolkit%2F_images%2Fshoes.png)
The training algorithm employs penalty-reduced focal losses in a point-wise manner for the center point and keypoint heatmaps, respectively. It also minimizes the center sub-pixel offset loss, keypoint sub-pixel offset loss, and the relative-scale loss. This model was trained using the CenterPose training app in TAO Toolkit v5.2.