Object 6-DoF pose estimation using auxiliary learning

Minjia CHEN; Shaoyan GAI; Feipeng DA; Jian YU

doi:10.37188/OPE.20243206.0901

Abstract

In order to accurately estimate the position and pose of an object in the camera coordinate system in challenging scenes with severe occlusion and scarce texture， while also enhancing network efficiency and simplifying the network architecture， this paper proposed a 6-DoF pose estimation method using auxiliary learning based on RGB-D data. The network took the target object image patch， corresponding depth map， and CAD model as inputs. First， a dual-branch point cloud registration network was used to obtain predicted point clouds in both the model space and the camera space. Then， for the auxiliary learning network， the target object image patch and the Depth-XYZ obtained from the depth map were input to the multi-modal feature extraction and fusion module， followed by coarse-to-fine pose estimation. The estimated results were used as priors for optimizing the loss calculation. Finally， during the performance evaluation stage， the auxiliary learning branch was discarded and only the outputs of the dual-branch point cloud registration network are used for 6-DoF pose estimation using point pair feature matching. Experimental results indicate that the proposed method achieves AUC of 95.9% and ADD-S<2 cm of 99.0% in the YCB-Video dataset； ADD（-S） result of 99.4% in the LineMOD dataset； and ADD（-S） result of 71.3% in the LM-O dataset. Compared with existing 6-DoF pose estimation methods， our method using auxiliary learning has advantages in terms of model performance and significantly improves pose estimation accuracy.

微信扫一扫：分享

微信扫一扫：分享