Current Issue Cover
结合上下文编码与特征融合的SAR图像分割

范艺华1,2, 董张玉1,2,3, 杨学志2,3,4(1.合肥工业大学计算机与信息学院, 合肥 230031;2.工业安全与应急技术安徽省重点实验室, 合肥 230031;3.智能互联系统安徽省实验室, 合肥 230031;4.合肥工业大学软件学院, 合肥 230031)

摘 要
目的 图像分割的中心任务是寻找更强大的特征表示,而合成孔径雷达(synthetic aperture radar,SAR)图像中斑点噪声阻碍特征提取。为加强对SAR图像特征的提取以及对特征充分利用,提出一种改进的全卷积分割网络。方法 该网络遵循编码器—解码器结构,主要包括上下文编码模块和特征融合模块两部分。上下文编码模块(contextual encoder module,CEM)通过捕获局部上下文和通道上下文信息增强对图像的特征提取;特征融合模块(feature fusion module,FFM)提取高层特征中的全局上下文信息,将其嵌入低层特征,然后将增强的低层特征并入解码网络,提升特征图分辨率恢复的准确性。结果 在两幅真实SAR图像上,采用5种基于全卷积神经网络的分割算法作为对比,并对CEM与CEM-FFM分别进行实验。结果<显示,该网络分割结果的总体精度(overall accuracy,OA)、平均精度(average accuracy,AA)与Kappa系数比5种先进算法均有显著提升。其中,网络在OA上表现最好,CEM在两幅SAR图像上OA分别为91.082%和90.903%,较对比算法中性能最优者分别提高了0.948%和0.941%,证实了CEM的有效性。而CEM-FFM在CEM基础上又将结果分别提高了2.149%和2.390%,验证了FFM的有效性。结论 本文提出的分割网络较其他方法对图像具有更强大的特征提取能力,且能更好地将低层特征中的空间信息与高层特征中的语义信息融合为一体,使得网络对特征的表征能力更强、图像分割结果更准确。
关键词
The integrated contextual encoding and feature fusion SAR images segmentation method

Fan Yihua1,2, Dong Zhangyu1,2,3, Yang Xuezhi2,3,4(1.College of Computer and Information, Hefei University of Technology, Hefei 230031, China;2.Anhui Province Key Laboratory of Industry Safety and Emergency Technology, Hefei 230031, China;3.Anhui Province Laboratory of Intelligent Interconnection System, Hefei 230031, China;4.College of Software, Hefei University of Technology, Hefei 230031, China)

Abstract
Objective Pixel-wise segmentation for synthetic aperture radar (SAR) images has been challenging due to the constraints of labeled SAR data, as well as the coherent speckle contextual information. Current semantic segmentation is challenged like existing algorithms as mentioned below:First, the ability to capture contextual information is insufficient. Some algorithms ignore contextual information or just focus on local spatial contextual information derived of a few pixels, and lack global spatial contextual information. Second, in order to improve the network performance, researchers are committed to developing the spatial dimension and ignoring the relationship between channels. Third, a neural network based high-level features extracted from the late layers are rich in semantic information and have blurred spatial details. A network based low-level features extraction contains more noise pixel-level information from the early layers. They are isolated from each other, so it is difficult to make full use of them. The most common ways are not efficient based on concatenate them or per-pixel addition. Method To solve these problems, a segmentation algorithm is proposed based on fully convolutional neural network (CNN). The whole network is based on the structure of encoder-decoder network. Our research facilitates a contextual encoding module and a feature fusion module for feature extraction and feature fusion. The different rates and channel attention mechanism based contextual encoding module consists of a residual connection, a standard convolution, two dilated convolutions. Among them, the residual connection is designed to neglect network degradation issues. Standard convolution is obtained by local features with 3×3 convolution kernel. After convolution, batch normalization and nonlinear activation function ReLU are connected to resist over-fitting. Dilated convolutions with 2×2 and 3×3 dilated rates extend the perception field and capture multi-scale features and local contextual features further. The channel attention mechanism learns the importance of each feature channel, enhances useful features in terms of this importance, inhibits features, and completes the modeling of the dependency between channels to obtain the context information of channels. First, the feature fusion module based global context features extraction is promoted, the in the high-level features. Specifically, the global average pooling suppresses each feature to a real number, which has a global perception field to some extent. Then, these numbers are embedding into the low-level features. The enhanced low-level features are transmitted to the decoding network, which can improve the effectiveness of up sampling. This module can greatly enhance its semantic representation with no the spatial information of low-level features loss, and improve the effectiveness of their integration. Our research carries out four contextual encoding modules and two feature fusion modules are stacked in the whole network. Result We demonstrated seven experimental schemes. In the first scheme, contextual encoder module (CEM) is used as the encoder block only; In the second scheme, we combined the CEM and the feature fusion module (FFM); the rest of them are five related methods like SegNet, U-Net, pyramid scene parsing network (PSPNet), FCN-DK3 and context-aware encoder network(CAEN). Our two real SAR images experiments contain a wealth of information scene experiment are Radarsat-2 Flevoland (RS2-Flevoland) and Radarsat-2 San-Francisco-Bay (RS2-SF-Bay). The option of overall accuracy (OA), average accuracy (AA) and Kappa coefficient is as the evaluation criteria. The OA of the CEM algorithm on the two real SAR images is 91.082% and 90.903% respectively in comparison to the five advanced algorithms mentioned above. The CEM-FFM algorithm increased 2.149% and 2.390% compare to CEM algorithm. Conclusion Our illustration designs a CNN based semantic segmentation algorithm. It is composed of two aspects of contextual encoding module and feature fusion module. The experiments have their priorities of the proposed method with other related algorithms. Our proposed segmentation network has stronger feature extraction ability, and integrates low-level features and high-level features greatly, which improves the feature representation ability of the stable network and more accurate results of image segmentation.
Keywords

订阅号|日报