面向室内场景的主被动融合视觉定位系统
谢挺1, 张晓杰2, 叶智超1, 王子豪1, 王政3, 张涌3, 周晓巍1, 姬晓鹏1(1.浙江大学CAD&CG国家重点实验室, 杭州 310058;2.中国船舶工业系统工程研究院, 北京 100094;3.中讯邮电咨询设计院有限公司, 北京 100048) 摘 要
目的 视觉定位旨在利用易于获取的RGB图像对运动物体进行目标定位及姿态估计。室内场景中普遍存在的物体遮挡、弱纹理区域等干扰极易造成目标关键点的错误估计,严重影响了视觉定位的精度。针对这一问题,本文提出一种主被动融合的室内定位系统,结合固定视角和移动视角的方案优势,实现室内场景中运动目标的精准定位。方法 提出一种基于平面先验的物体位姿估计方法,在关键点检测的单目定位框架基础上,使用平面约束进行3自由度姿态优化,提升固定视角下室内平面中运动目标的定位稳定性。基于无损卡尔曼滤波算法设计了一套数据融合定位系统,将从固定视角得到的被动式定位结果与从移动视角得到的主动式定位结果进行融合,提升了运动目标的位姿估计结果的可靠性。结果 本文提出的主被动融合室内视觉定位系统在iGibson仿真数据集上的平均定位精度为2~3 cm,定位误差在10 cm内的准确率为99%;在真实场景中平均定位精度为3~4 cm,定位误差在10 cm内的准确率在90%以上,实现了cm级的定位精度。结论 提出的室内视觉定位系统融合了被动式和主动式定位方法的优势,能够以较低设备成本实现室内场景中高精度的目标定位结果,并在遮挡、目标丢失等复杂环境因素干扰下展示出鲁棒的定位性能。
关键词
Visual localization system of integrated active and passive perception for indoor scenes
Xie Ting1, Zhang Xiaojie2, Ye Zhichao1, Wang Zihao1, Wang Zheng3, Zhang Yong3, Zhou Xiaowei1, Ji Xiaopeng1(1.State Key Laboratory of CAD&CG, Zhejiang University, Hangzhou 310058, China;2.System Engineering Research Institute, Beijing 100094, China;3.China Information Technology Designing & Consulting Institute, Beijing 100048, China) Abstract
Objective Visual localization is focused on the location and estimation of motion objects via easy-to-use RGB images. The feature-extracted information is challenged to meet the requirements of tasks in traditional computer vision methods in terms of feature extraction algorithms. The deep learning-based feature abstraction and demonstration ability can promote an emerging research issue for pose estimation in computer vision. In addition, the development and application of depth cameras and lasers-based sensors can provide more diverse manners to this issue as well. However, these sensors have some constraints of the entity and shape of the object and it need to be used in a structured environment. Multi-vision ability is often challenged to the issues of installing and debugging problems. In contrast, sensors-visual applications are featured of low cost and less restrictions, and they are easy to be recognized and extended for multiple unstructured scenarios. Interferences are being existed in indoor scenes, such as object occlusion and weak texture areas, which can cause the incorrect estimation of the target points easily and affect the accuracy of visual localization severely. The different methods of camera-deployment can be divided into two categories based on visual object pose estimation method. 1) In order to get the target position data, one category of the two is based on monocular object positioning of pose estimation technology of using the deployment in cameras-fixed in the scene and detecting targets in the images of the relevant information. The pros of positioning result is stable and the cons of it is affected by light and fuzzy image easily, it cannot be dealt with object occlusion in the scene as well due to the limitation of observation angle; 2) The other category of two is oriented on scene reconstruction-based object pose estimation technology, which can use the camera fixed on the target itself to obtain the pose information of the target by detecting the feature points of the scene and matching the features with the 3D scene model constructed in advance. This scheme is derived of the status of texture features. 1) For rich textures and clear features scenes, the accurate positioning results-related can be obtained. 2) For non-texture features scenes and weak texture areas like walls scene, the positioning results are unstable, and other sensors such as inertial measurement unit (IMU) are needed to be positioning-aided. To achieve more precise positions of moving objects in indoor scenes, we propose an active and passive perception-based visual localization system, which combines the advantages of fixed and motion perspectives. Method First, a plane-prior object pose estimation method is proposed by our research team. Based on the monocular localization framework of keypoint detection, the plane-constraint is used to optimize the 3-DoF (degree of freedom) pose of the object and improve the localization stability under a fixed view. Second, we design a data fusion framework in terms of the unscented Kalman filtering algorithm. To improve the reliability of the pose estimation of the moving target, a fixed view-derived passive perception output and the active perception output are fused from a motion view. The active and passive-integrated indoor visual positioning system is composed of three aspects as mentioned below:1) passive positioning module, 2) active positioning module, and 3) active and passive fusion module. The input of passive positioning module is oriented to RGB image-captured by indoor fixed camera, and the output is based on the target pose data-contained in the image. The input of the active positioning module is the RGB image shot on the perspective of the target to be located, and the output is based on the position and pose-relevant information of the target in the 3D scene. The active and passive fusion module is dealt with the integrating the positioning results of passive and active positioning, and the output is linked to more accurate positioning result of the target in the indoor scene. Result The average localization error of the indoor visual localization system proposed can reach 2~3 cm on the iGibson simulation dataset, and the accuracy of the 10 cm-within localization error can reach to 99%. In the real scenes, the average localization error can reach 3~4 cm, and the accuracy of the localization error within 10 cm is above 90%. Experimental results are shown our proposed system can obtain centimeter-level accurate positioning. The experimental results of real scenes illustrate that the active and passive fusion visual positioning system can reduce the external interference of passive positioning algorithm under fixed visual angle effectively due to the limitation of visual angle, object occlusion and other external disturbances, and it also can optimize the defects of single frame positioning algorithm with insufficient stability and large random error. Conclusion Our visual localization system has its potentials to the integrated advantages of passive-based and active-based methods, which can achieve high-precision positioning results in indoor scenes at a low cost. It also shows better robust performance under complex interference such as occlusion and target-missed. We develop a lossless Kalman filter based framework of active and passive fusion indoor visual positioning system for indoor mobile robot operation. Compared to the existing visual positioning algorithm, it can achieve high-precision target positioning results in indoor scenes with lower equipment cost. And, under the shade circumstances, the loss of target under complex environment factors shows robust positioning performance and the indoor scene visual centimeter-level accuracy-purified of positioning. The performance is tested and validated in simulation and the physical environment both. The experimental results show that the positioning system has its priority on high positioning accuracy and robustness for multiple scenarios further.
Keywords
|