Current Issue Cover
结合语义分割与模型匹配的室内场景重建方法

宁小娟1,2, 巩亮1, 韩怡1, 马婷1, 石争浩1,2, 金海燕1,2, 王映辉3(1.西安理工大学计算机科学与工程学院, 西安 710048;2.陕西省网络计算与安全技术重点实验室, 西安 710048;3.江南大学人工智能与计算机学院, 无锡 214122)

摘 要
目的 由于室内点云场景中物体的密集性、复杂性以及多遮挡等带来的数据不完整和多噪声问题,极大地限制了室内点云场景的重建工作,无法保证场景重建的准确度。为了更好地从无序点云中恢复出完整的场景,提出了一种基于语义分割的室内场景重建方法。方法 通过体素滤波对原始数据进行下采样,计算场景三维尺度不变特征变换(3D scale-invariant feature transform,3D SIFT)特征点,融合下采样结果与场景特征点从而获得优化的场景下采样结果;利用随机抽样一致算法(random sample consensus,RANSAC)对融合采样后的场景提取平面特征,将该特征输入PointNet网络中进行训练,确保共面的点具有相同的局部特征,从而得到每个点在数据集中各个类别的置信度,在此基础上,提出了一种基于投影的区域生长优化方法,聚合语义分割结果中同一物体的点,获得更精细的分割结果;将场景物体的分割结果划分为内环境元素或外环境元素,分别采用模型匹配的方法、平面拟合的方法从而实现场景的重建。结果 在S3DIS (Stanford large-scale 3D indoor space dataset)数据集上进行实验,本文融合采样算法对后续方法的效率和效果有着不同程度的提高,采样后平面提取算法的运行时间仅为采样前的15%;而语义分割方法在全局准确率(overall accuracy,OA)和平均交并比(mean intersection over union,mIoU)两个方面比PointNet网络分别提高了2.3%和4.2%。结论 本文方法能够在保留关键点的同时提高计算效率,在分割准确率方面也有着明显提升,同时可以得到高质量的重建结果。
关键词
Semantic segmentation and model matching-integrated indoor scenario-relevant reconstruction method

Ning Xiaojuan1,2, Gong Liang1, Han Yi1, Ma Ting1, Shi Zhenghao1,2, Jin Haiyan1,2, Wang Yinghui3(1.School of Computer Science and Engineering, Xi'an University of Technology, Xi'an 710048, China;2.Shaanxi Provincial Key Laboratory of Network Computing and Security Technology, Xi'an 710048, China;3.School of Artificial Intelligence and Computer, Jiangnan University, Wuxi 214122, China)

Abstract
Objective Virtual reality technique has been focused on in relevance to such domains like intelligent robot, computer vision and artificial intelligence,and multiple scenes-oriented 3D reconstructions. Recent indoor scene reconstruction has been developing intensively in related to computer vision and robotics. The key task of 3D reconstruction is oriented to transform the point cloud data of indoor scene into a lightweight 3D scene model based on the spatial,geometric,semantic and other related features of point cloud. However,3D indoor modeling is still challenged to reconstruct high quality 3D indoor scene straightforward because of complex structure,high occlusion and variability of indoor scenes. Current scene reconstruction methods are mainly segmented into such of methods relevant to model matching,machine learning,and deep learning. Model matching-based methods are linked to feature point selection in the matching process. Machine learning-based method is focused on scene segmentation,and its target can be relatively detected and replaced based on partial matching. However,when the indoor objects are severely missed,it is still challenged to deal with such narrow and cluttered indoor scene. Deep learning-based methods are required for training reliable and high-quality scene data,for which domain-specific and data-acquired are often costly for new scenes. To resolve these problems,we develop a semantic segmentation-based point cloud indoor scene reconstruction method,which can melt the point cloud data into a high-quality 3D scene model efficiently and accurately. The method proposed can be divided into three steps as listed below:fusion sampling,semantic segmentation and instance segmentation,and scene reconstruction. Method We demonstrate a semantic segmentation-based indoor scene reconstruction method. First,a down-sampling method is developed in terms of 3D scale-invariant feature transform(3D SIFT) feature points extraction and voxel filtering. It takes the local features of the scene as the guidance,and voxel filtering method is used to down-sample the point cloud and remove the noise outliers. The local feature points of the scene data are then obtained by 3D SIFT,which are used to optimize possible loss of key points in the sampling process under the voxel filtering. The local feature points are combined with voxel filtering to obtain the optimized sampling results. It can optimize a single voxel filter-derived critical points loss effectively,and efficient data representation can be offered for the semantic segmentation of subsequent indoor scenes. Second,we illustrate a plane feature-enhanced multi-level semantic segmentation method of PointNet. The plane feature is extracted based on the sampled scene of random sample consensus(RANSAC) algorithm,and planar features-related data is constructed as the dataset of training and testing network model,and the PointNet is then used for end-to-end scene semantic segmentation. The projection-based region growing optimization method is adopted to realize the fine segmentation of objects in indoor scene further. It can be used to optimize PointNet local feature representation and the accuracy of scene semantic segmentation to a certain extent. Finally,a model matching and plane fitting based 3D scene model reconstruction method is facilitated for both of internal and external scenarios-derived objects. The model library of the scene objects is built up in terms of the semantic segmentation analysis of the scene. To deal with the complex structure of each internal scenario-derived object,object and models-between similarities is calculated in the model library. Model matching method is melted in based on heuristic search and the semantic flags and local features of scene elements are used as indexes to carry out rough retrieval from model library,and the optimal matching model is used to match the objects in the scene to align and replace the objects in the scene. Therefore,the reconstruction work of internal scenario-derived objects can be completed further. The outdoor-related external environment objects are reconstructed via plane fitting method. After the axially-aligned bounding box(AABB) of each scene object is calculated,the plane model can be generated to complete the reconstruction of the external-related objects. Result To evaluate the performance of the proposed method,experiments are carried out in down-sampling,semantic segmentation,instance segmentation based on the Stanford large-scale 3D indoor space dataset (S3DIS). Experimental analyses demonstrate that the proposed fusion of plane feature enhancement and voxel filtering can get better plane extraction results in comparison with the non-sampled data. The running time of plane extraction algorithm is shrinked 85% significantly after down-sampling,and it can be optimized about 62% with down-sampling. Compared to PointNet,plane feature-enhanced semantic segmentation method is proposed and trained in Area-1-Area-5 scene and tested in Area-6 scene. The overall accuracy(OA) can be reached to 84. 02% and mean intersection over union(mIoU) is reached to 60. 65%,in which each of them are improved 2. 3% and 4. 2% than PointNet network. Conclusion The S3DIS dataset-based experimental results have demonstrated that our method proposed can be dealt with semantic segmentation in related to large-scale indoor scenes. It can extract planar features better through the fusion of voxel filtering and the 3D SIFT. Furthermore,S3DIS area-6-related experiments have demonstrated that the performance of semantic segmentation is improved as well. The scene reconstruction method proposed can obtain more refined and accurate scene reconstruction results to some extent. Future research direction is predicted and focused on the completion of small objects with complex structures such as tables,chairs and bookshelves,which refers to little improvement in the accuracy of segmentation and reconstruction of such objects. To improve the accuracy of semantic segmentation,deep learning-based method can be probably used to deal with the features of small objects. It is required to develop potential reconstruction methods in the context of large and complex indoor scenes,especially for the scenes-related objects modeling.
Keywords

订阅号|日报