融合稀疏注意力和实例增强的雷达点云分割
摘 要
目的 雷达点云语义分割是3维环境感知的重要环节,准确分割雷达点云对象对无人驾驶汽车和自主移动机器人等应用具有重要意义。由于雷达点云数据具有非结构化特征,为提取有效的语义信息,通常将不规则的点云数据投影成结构化的2维图像,但会造成点云数据中几何信息丢失,不能得到高精度分割效果。此外,真实数据集中存在数据分布不均匀问题,导致小样本物体分割效果较差。为解决这些问题,本文提出一种基于稀疏注意力和实例增强的雷达点云分割方法,有效提高了激光雷达点云语义分割精度。方法 针对数据集中数据分布不平衡问题,采用实例注入方式增强点云数据。首先,通过提取数据集中的点云实例数据,并在训练中将实例数据注入到每一帧点云中,实现实例增强的效果。由于稀疏卷积网络不能获得较大的感受野,提出Transformer模块扩大网络的感受野。为了提取特征图的关键信息,使用基于稀疏卷积的空间注意力机制,显著提高了网络性能。另外,对不同类别点云对象的边缘,提出新的TVloss用于增强网络的监督能力。结果 本文提出的模型在SemanticKITTI和nuScenes数据集上进行测试。在SemanticKITTI数据集上,本文方法在线单帧精度在平均交并比(mean intersection over union,mIoU)指标上为64.6%,在nuScenes数据集上为75.6%。消融实验表明,本文方法的精度在baseline的基础上提高了3.1%。结论 实验结果表明,本文提出的基于稀疏注意力和实例增强的雷达点云分割方法在SemanticKITTI和nuScenes数据集上都取得了较好表现,提高了网络对点云细节的分割能力,使点云分割结果更加准确。
关键词
LiDAR point cloud semantic segmentation combined with sparse attention and instance enhancement
Liu Sheng, Cao Yifeng, Huang Wenhao, Li Dingda(College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, China) Abstract
Objective Outdoor-perceptive recognition is essential for robots-mobile and autonomous driving vehicles applications. LiDAR-based point cloud semantic segmentation has been developing for that. Three-dimensional image-relevant (3D image-relevant) LiDAR can be focused on the range of information quickly and accurately for outdoor-related perception with no illumination effects. To get feasible effects for autonomous driving vehicles, LiDAR point cloud-related semantic segmentation can be predicted in terms of point cloud analysis for overall scene factors like roads, vehicles, pedestrians, and plants. Recent deep learning-based (DL-based) two-dimensional image-relevant (2D image-relevant) computer vision has been developing intensively. Nevertheless, LiDAR point cloud data is featured of unstructured, disorder, sparse and non-uniform densities beyond 2D image-relevant structured data. The challenging issue is to extract semantic information from LiDAR data effectively.DL-based methods can be divided into three categories:1) point-based, 2) projection-relevant, and 3) voxel-related. To extract effective semantic information, the existing methods are often used to project irregular point cloud data into 2D images-structured because of the unstructured characteristics of LiDAR point cloud data. However, geometric information loss-derived high-precision segmentation results cannot be obtained well. In addition, lower segmentation effect for small sample objects has restricted by uneven data distribution. To resolve these problems, we develop a sparse attention and instance enhancement-based LiDAR point cloud segmentation method, which can improve the accuracy of semantic segmentation of LiDAR point cloud effectively. Method An end-to-end sparse convolution-based network is demonstrated for LiDAR point cloud semantic segmentation. To optimize uneven data distribution in the training data set, instance-injected is used to enhance the point cloud data. Instance-injected can be employed to extract its points cloud data factors like pedestrians, vehicles, and bicycles. Instance-related data is injected into an appropriate position of each frame during the training process. Recently, the receptive field-strengthened and attention mechanism-aware visual semantic segmentation tasks are mainly focused on. But, a wider receptive field cannot be realized due to the encoder-decoder-based network ability. A lightweight Transformer module is then illustrated to widen the receptive field of the network. To get global information better, the Transformer module can be used to build up the interconnection between each non-empty voxel. The Transformer module is used in the bottleneck layer of the network for memory optimization. To extract the key positions of the feature map, a sparse convolution-based spatial attention module is proposed as well. Additionally, to clarify the edges of multiple types of point cloud objects, a new TVloss is adopted to identify the semantic boundaries and alleviate the noise within each region-predicted. Result Our model is proposed and evaluated on SemanticKITTI dataset and nuScenes dataset both. It achieves 64.6% mean intersection over union (mIoU) in the single-frame accuracy evaluation of SemanticKITTI, and 75.6% mIoU on the nuScenes dataset. The ablation experiments show that the mIoU is improved by 1.2% in terms of instance-injected, and the spatial attention module has an improvement of 1.0% and 0.7% each based on sparse convolution and the transformer module. The efficiency of these two modules is improved a total of 1.5%, the mIoU-based TVloss achieves 0.2% final gain. The integrated analysis of all modules is increased by 3.1% in comparison with the benchmark. Conclusion A new sparse convolution-based end-to-end network is developed for LiDAR point cloud semantic segmentation. We use instance-injected to resolve the problem of the unbalanced distribution of data profiling. A wider range of receptive field is achieved in terms of the proposed Transformer module. To extract the key location of the feature map, a sparse convolution-based spatial attention mechanism is melted into. A new TVloss loss function is added and the edge of the objects in point clouds is clarified. The comparative experiments are designed in comparison with recent SOTA(state of the art) methods, including projection and point-based methods. Our proposed method has its potentials for the improved segmentation ability of the network to point cloud details and the effectiveness for point cloud segmentation further.
Keywords
LiDAR semantic segmentation spatial attention mechanism Transformer deep learning (DL) instanceenhancement
|