Current Issue Cover
深度学习实时语义分割研究进展和挑战

王卓1,2, 瞿绍军1,2(1.湖南师范大学湖南省智能计算与语言信息处理重点实验室, 长沙 410081;2.湖南师范大学信息科学与工程学院, 长沙 410081)

摘 要
语义分割作为计算机视觉领域的重要研究方向之一,应用十分广泛。其目的是根据预先定义好的类别对输入图像进行像素级别的分类。实时语义分割则在一般语义分割的基础上又增加了对速度的要求,广泛应用于如无人驾驶、医学图像分析、视频监控与航拍图像等领域。其要求分割方法不仅要取得较高的分割精度,且分割速度也要快。随着深度学习和神经网络的快速发展,实时语义分割也取得了一定的研究成果。本文在前人已有工作的基础上对基于深度学习的实时语义分割算法进行系统的归纳总结,包括基于Transformer和剪枝的方法等,全面介绍实时语义分割方法在各领域中的应用。首先介绍实时语义分割的概念,再根据标签的数量和质量,将现有的基于深度学习的实时语义分割方法分为强监督学习、弱监督学习和无监督学习3个类别。在分类的基础上,结合各个类别中最具有代表性的方法,对其优缺点展开分析,并从多个角度进行比较。随后介绍目前实时语义分割常用的数据集和评价指标,并对比分析各算法在各数据集上的实验效果,阐述现阶段实时语义分割的应用场景。最后,讨论了基于深度学习的实时语义分割存在的挑战,并对实时语义分割未来值得研究的方向进行展望,为研究者们解决存在的问题提供便利。
关键词
Research progress and challenges in real-time semantic segmentation for deep learning

Wang Zhuo1,2, Qu Shaojun1,2(1.Hunan Provincial Key Laboratory of Intelligent Computing and Language Information Processing, Hunan Normal University, Changsha 410081, China;2.College of Information Science and Engineering, Hunan Normal University, Changsha 410081, China)

Abstract
Semantic segmentation is widely used as an important research direction in the field of computer vision,and its purpose is to classify the input image at the pixel level according to predefined categories. Real-time semantic segmentation,as a subfield of semantic segmentation,adds speed requirements to segmentation methods on the basis of general semantic segmentation and is widely used in fields,such as unmanned driving,medical image analysis,video surveillance,and aerial images. The segmentation method should achieve not only high segmentation accuracy but also fast segmentation speed(specifically,the speed of processing images per unit time reaches 30 frames). With the rapid development of deep learning technology and neural networks,real-time semantic segmentation has also achieved certain research results. Majority of previous researchers have discussed semantic segmentation,but review papers on real-time semantic segmentation methods are few. In this paper,we systematically summarize the real-time semantic segmentation algorithms based on deep learning on the basis of the existing work of the previous researchers. We first introduce the concept of realtime semantic segmentation,and then,according to the number and quality of the participating training labels,the existing real-time semantic segmentation methods based on deep learning are categorized into three classes:strongly supervised learning,weakly supervised learning,and unsupervised learning. Strongly supervised learning methods are categorized from three perspectives:improving accuracy,improving speed,and other methods. Accuracy improvement methods are further divided into subcategories according to the network structure and feature fusion methods. According to the network structure,the real-time semantic segmentation methods can be categorized into encoder-decoder structure,two-branch structure,and multibranch structure;the representative networks in the encoder-decoder section are fully convolutional network(FCN)and UNet;the networks with two-branch structure are the BiSeNet series;and the multibranch structure has ICNet and DFANet. According to the different ways of feature fusion,real-time semantic segmentation methods can be categorized into multiscale feature fusion and attention mechanism. According to the different ways of feature sampling in the process of multiscale feature fusion,this study divides multiscale feature fusion into atrous spatial pyramid pooling and ordinary pyramid pooling;the attention mechanism can be further divided into self-attention mechanism,channel attention,and spatial attention according to the computation method of the attention vector. The methods to improve the speed are analyzed and discussed from the perspectives of improving convolutional blocks and lightweight networks;the methods to improve convolutional blocks can be divided into separable convolution(separable convolution can be divided into depth separable convolution and spatial separable convolution),grouped convolution,and atrous convolution. Among other methods of strongly supervised learning,we also specifically add methods of knowledge distillation,Transformer-based methods,and pruning,which are less mentioned in other literatures. Given the numerous methods for real-time semantic segmentation based on strongly supervised learning,we also perform a comparative analysis of the strengths and weaknesses of all the mentioned methods. Real-time semantic segmentation based on weakly supervised learning is classified into methods based on image-level labeling,methods based on point labeling,methods based on object box labeling,and methods based on object underlining labeling. The concept of unsupervised learning is introduced,and the commonly used unsupervised semantic segmentation methods at the present stage are described,including the method with the introduction of the generalized domain adaptation problem and the method with the introduction of unsupervised pre-adaptation task. Subsequently,the datasets and evaluation indexes commonly used in real-time semantic segmentation are introduced. In addition to the street scene dataset commonly used in unmanned counting,this study supplements the medical image dataset. In the evaluation indexes,this study provides a detailed introduction to the accuracy measure and speed measure and then compares the experimental effects of the algorithms on the datasets so far through the table to obtain the latest research progress in the field. The application scenarios of real-time semantic segmentation are further elaborated in detail. Real-time semantic segmentation can be applied to automatic driving,which can segment road scene images in a short time to help identify roads,traffic signs,pedestrians,vehicles,and other objects. By segmenting medical images at the pixel level,real-time semantic segmentation can also help doctors identify and localize lesion areas accurately. In the field of natural disaster monitoring and emergency rescue,real-time semantic segmentation can quickly identify airplanes and aircrafts and can help doctors identify and locate lesion areas accurately. Real-time semantic segmentation can quickly recognize disaster areas in aerial images;real-time segmentation of scenes and objects in surveillance videos can provide accurate and intelligent data for surveillance systems. Then,according to the specific application scenarios of real-time semantic segmentation and the problems encountered at this stage,this study considers that the challenges faced by real-time semantic segmentation include the following:1)mobile segmentation problem,which hardly develops large-scale computation on low-storage devices;2)how to get away from the dependence of efficient networks on hardware devices;3)experimental accuracy of the current real-time semantic segmentation model,which hardly reaches the standard of automatic driving;4)lack of scene data for medical image and 3D point cloud design. Finally,this study gives an outlook on the future directions of realtime semantic segmentation that are worth researching,e. g. ,occlusion segmentation,real-time semantic segmentation of small targets,adaptive learning model,cross-modal joint learning,data-centered real-time semantic segmentation,and small-sample real-time semantic segmentation.
Keywords

订阅号|日报