Current Issue Cover
融入混合注意力的可变形空洞卷积 近岸SAR小舰船检测

龚声蓉1,2, 徐少杰1,2, 周立凡2, 朱杰1, 钟珊2(1.东北石油大学计算机与信息技术学院, 大庆 163318;2.常熟理工学院计算机科学与工程学院,常熟 215500)

摘 要
目的 在近岸合成孔径雷达(synthetic aperture radar,SAR)图像舰船检测中,由于陆地建筑及岛屿等复杂背景的影响,小型舰船与周边相似建筑及岛屿容易混淆。现有方法通常使用固定大小的方形卷积核提取图像特征。但是小型舰船在图像中占比较小,且呈长条形倾斜分布。固定大小的方形卷积核引入了过多背景信息,对分类造成干扰。为此,本文针对SAR图像舰船目标提出一种基于可变形空洞卷积的骨干网络。方法 首先用可变形空洞卷积核代替传统卷积核,使提取特征位置更贴合目标形状,强化对舰船目标本身区域和边缘特征的提取能力,减少背景信息提取。然后提出3通道混合注意力机制来加强局部细节信息提取,突出小型舰船与暗礁、岛屿等的差异性,提高模型细分类效果。结果 在SAR图像舰船数据集HRSID(high-resolution SAR images dataset)上的实验结果表明,本文方法应用在Cascade-RCNN(cascade region convolutional neural network)、YOLOv4(you only look once v4)和BorderDet(border detection)3种检测模型上,与原模型相比,对小型舰船的检测精度分别提高了3.5%、2.6%和2.9%,总体精度达到89.9%。在SSDD(SAR ship detection dataset)数据集上的总体精度达到95.9%,优于现有方法。结论 本文通过改进骨干网络,使模型能够改变卷积核形状和大小,集中获取目标信息,抑制背景信息干扰,有效降低了SAR图像近岸复杂背景下小型舰船的误检漏检情况。
关键词
Deformable atrous convolution nearshore SAR small ship detection incorporating mixed attention

Gong Shengrong1,2, Xu Shaojie1,2, Zhou Lifan2, Zhu Jie1, Zhong Shan2(1.School of Computer and Information Technology, Northeast Petroleum University, Daqing 163318, China;2.School of Computer Science and Engineering, Changshu Institute of Technology, Changshu 215500, China)

Abstract
Objective Synthetic aperture radar (SAR) image based vessels detection is essential for marine-oriented detection and administration. Traditional constant false alarm rate (CFAR) algorithms have contributed on the targets analyses, such as reliance on hand-made features, slow speed, and susceptibility to interference from ship-like objects like roofs and containers. Convolutional neural network (CNN) based detectors have fundamentally improved detection accuracy. However, there are a large number of vessels detection results are restricted of complicated docking directions and multiple sizes in the high-resolution SAR images, so the recognition rate of the model remains low for some, especially small ships in the complex scenarios near the shore. Using the convolution kernel to extract features, the weights in the convolution kernel are multiplied with the values at the corresponding locations of the feature map. Therefore, the matching degree between the convolution kernel shape and the target shape could determine its efficiency and quality of feature extraction to a certain extent. If the shape of the convolution kernel is more similar to the target shape, the extracted feature map will contain the complete information of the target. Otherwise, the feature map will contain many background features that interfere with model classification and localization. Traditional methods are still challenged that the square convolutional kernel does not fit the shape of a ship with a long strip of random docking direction well. So, we tend to develop a backbone network based on deformable cavity convolution for that. Method Weighted fusion deformable atrous convolution (WFDAC) can somewhat adaptively change the shape and size of the convolution kernels and weight the features extracted by different convolution kernels in terms of the learned weights. In this way, the network can be made to actively learn any feature kernels are more capable of extracting features that match the target shape, thus the information-related is enhanced for the extraction of target region and suppressing background. The WFDAC module consists of two deformable convolutional kernels with different atrous rates and a 1 × 1 convolutional kernel that computes the fusion weights of the two deformable convolutional kernels in parallel. Furthermore, different perceptual fields are resulted in since the two parallel deformable convolutional kernels have different atrous rates. Therefore, deep feature extraction is challenged that smaller atrous rate-derived deformable convolutional kernel may duplicate the features within the perceptual field of larger atrous rate-context deformable convolutional kernel in shallow feature extraction. That is, features within the same receptive field are extracted and fused by at least two cross-layer deformable convolutional kernels. This can enhance the feature extraction efficiency of the network. In addition, to extract the discrepancy between small targets and near shore reefs and coastal zone buildings, we proposed a three-channel mixed attention (TMA) mechanism as well. It uses three parallel branches to obtain the cross-latitude interactions of model parameters by means of rotation and residual connection, as a method to calculate the weight relationship between model parameters. By multiplying the weights with the original parameter values, the differences between small vessels and shaped buildings and islands can be sharpened, and the weight of similarity features between them in model classification can be reduced, thus improving the model fine classification effect. Result The ablation and comparative experiments are conducted on SAR image ship datasets: high-resolution SAR images dataset (HRSID) and SAR ship detection dataset (SSDD). The model is first trained using the training set, and then the accuracy of the model is tested using the test set. We use several evaluation metrics to judge the model performance in terms of the internet of union (IoU) and the target pixel size. The experimental results show that our method can improve the detection accuracy of the model for SAR ship targets effectively, especially for small ones. Using our backbone network feature extraction network (FEN) instead of ResNet-50, the results on the HRSID dataset show that the detection accuracy is increased by 3.5%, 2.6%, and 2.9%, respectively on the three detection models: cascade region convolutional neural network (Cascade-RCNN), you only look once v4 (YOLOv4), and border detection (BorderDet). For small ships, an overall accuracy is reached of 89.9%. In order to verify whether the models improve the detection accuracy of small ships in the nearshore-complicated background, we segment the test set of the HRSID dataset into two scenarios: nearshore and offshore. The test analyses show that the accuracy is improved by 3.5% and 1.2% in the nearshore and offshore scenarios, respectively. Additionally, we designed a set of experiments to validate the effect of the atrous rate on the WFDAC module, which the atrous rate of one branch of two parallel deformable convolutions is fixed to 1, and the atrous rate of the other branches are set to 1, 3, and 5 sequentially. The experimental results show that the WFDAC module performs quite well when the atrous rate of one branch is 1 and the atrous rate of the other branch is 3. The overall accuracy on the SSDD dataset reached 95.9%. Conclusion Our backbone network-improved model can change the shape and size of the convolution kernel to focus on acquiring target information and suppressing background information interference. It reduces the false/loss ratio of small ships detection of SAR images effectively in the complex background of near shore.
Keywords

订阅号|日报