Current Issue Cover
融合边界框高斯建模与特征聚合分发的遥感飞机细粒度识别

王晓燕1, 梁文辉2, 李杰3, 牟建宏2, 王禧钰2(1.北京物资学院统计与数据科学学院;2.北京物资学院信息学院;3.北京建筑大学机电与车辆工程学院)

摘 要
摘要: 目的 遥感飞机影像由于目标尺寸差距大,采集过程中受光照、遮挡等因素的影响,导致不同型号飞机特征相似,小目标检测效果不好、类内无法实现细粒度区分。为了解决上述问题,本文提出了一种融合边界框高斯建模与特征聚合分发的YOLOv5s遥感飞机细粒度识别算法。 方法 首先,将归一化高斯瓦瑟斯坦距离(Normalized Gaussian Wasserstein Distance, NWD)与IoU(Intersection over Union)及其衍生指标相结合,并合理地设置两者的比例参数,改进YOLOv5s位置损失的计算方式,从而提高该方法对小目标的敏感度。其次,在YOLOv5s的Neck部分引入特征聚合分发模块(Gatherand-Distribute, GD),在原网络“自顶向下、横向连接”的特征融合方式的基础上做到了跨层融合信息,增强网络细粒度特征、全局特征提取能力,提高了整体检测精度。为检验本算法在军用飞机上的细粒度和小目标识别优势,本文使用遥感飞机细粒度数据集MAR20和遥感飞机小目标数据集CORS-ADD进行实验。 结果 实验结果显示:对于数据集MAR20和CORS-ADD,模型精确度分别达到了99.10%和95.36%,与原YOLOv5s、YOLOv8s、Gold-YOLO和Faster-RCNN相比,检测精度最佳。实验验证了模型在细粒度和小目标检测方面性能更加优秀,在检测结果上与真实结果更加接近,改进算法细粒度和小目标检测精度最佳。 结论 实验结果表明,本文算法在检测性能和模型精度上的表现优于上述四种目标检测算法,模型具有良好的实用价值。
关键词
Fusion of Bounding Box Gaussian Modeling and Feature Aggregation Distribution for Fine-Grained recognition of remote sensing aircraft images

(Beijing University of Civil Engineering and Architecture)

Abstract
Abstract: Purpose As a basic branch of computer vision, Object Detection plays an important role in subsequent tasks such as image segmentation and target tracking. It aims at finding all of the objects in the image and determine the location and category of the objects. It is used in industrial testing and has profound and extensive applications in aerospace, autonomous driving and other fields. Aircraft detection in remote sensing images is of great significance to both military and civilian fields such as air traffic control and battlefield dynamic monitoring. Due to the large differences in target size in remote sensing aircraft images, the acquisition process is affected by factors such as lighting and occlusion, resulting in similar characteristics of different types of aircraft, poor detection of small targets, and the inability to achieve fine-grained distinction within categories. In Object Detection, the loss function is used to measure the difference between the model prediction and the actual target, which directly affects the performance and convergence speed of the model. Adjusting the model parameters so that the value of the loss function reaches the minimum value can improve the accuracy of the model in the test set. The loss function of YOLOv5 consists of position loss, category loss and confidence loss. YOLOv5 uses the IoU(Intersection over Union)derivative algorithm CIoU(Complete Intersection over Union)by default, and provides IoU, GIoU(Generalized Intersection over Union) and DIoU(Distance Intersection over Union) for replacement. However, for small target detection, especially anchor box-based algorithms such as YOLOv5, the IoU series indicators cannot well meet application needs. Different types of remote sensing aircraft have fine-grained characteristics, which are reflected in subtle differences between classes, large differences within classes, and detail accuracy within classes. For fine-grained recognition tasks, extracting local information is crucial. The feature fusion module PANet used by YOLOv5s cannot achieve global feature fusion and is not conducive to extracting fine-grained features. In order to solve the above problems, this article proposes a model improvement algorithm based on YOLOv5s. Method In view of the shortcomings of IoU in small target detection based on YOLOv5, this article introduces Gaussian Wasserstein Distance into the calculation of bounding box overlap to improve the detection performance of the network. Different from the IoU series of algorithms that calculate the similarity between different prediction boxes and real boxes based on the set of pixels contained in the bounding box, the Gaussian Wasserstein distance abandons the set, models the bounding box as a two-dimensional Gaussian distribution, and proposes A new metric "Normalized Gaussian Wasserstein Distance" is developed to calculate the similarity between frames, which fundamentally solves the problem of IoU in small target detection based on YOLOv5. In response to PANet"s shortcomings in fine-grained detection, this article introduces the Gatherand-Distribute feature aggregation module in Gold-YOLO into YOLOv5s to enhance the YOLOv5s network"s ability to extract fine-grained features through convolution and self-attention mechanisms. (1) Use the method combining Gaussian Wasserstein distance and traditional IoU to improve the loss function of YOLOv5s. (2) The Gatherand-Distribute feature aggregation module is introduced in the Neck part of YOLOv5s to enhance the network’s local feature extraction capabilities. Through the above two methods, the overall detection accuracy is improved. In order to test the advantages of this algorithm in fine-grained and small target recognition on military aircraft, this paper uses the remote sensing aircraft fine-grained classification data set MAR20 and the remote sensing aircraft small target data set CORS-ADD to conduct experiments. In the field of remote sensing military aircraft identification, different types of aircraft often have similar characteristics, resulting in different types of aircraft having similar characteristics, making it difficult to achieve intra-class identification. This article uses the open source target detection remote sensing image data set Military Aircraft Recognition 20 to achieve fine-grained recognition of remote sensing military aircraft. The dataset contains a total of 3842 images, including 20 military aircraft models (SU-35, C-130, C-17, C-5, F-16, TU-160, E-3, B-52, P-3C , B-1B, E-8, TU-22, F-15, KC-135, F-22, FA-18, TU-95, KC-10, SU-34, SU-24). The CORS-ADD data set is a complex optical remote sensing aircraft small target data set manually annotated and constructed by the Space Optical Engineering Research Center of Harbin Institute of Technology. It contains a total of 7337 images, including 32285 aircraft instances, and the target size ranges from 4×4 pixels to ranging from 240×240 pixels. Different from the single data source of previous remote sensing data sets, the CORS-ADD data set comes from satellite platforms such as Google Map, WorldView-2, WorldView-3, Pleiades, Jilin-1, IKONOS, etc., covering airports, aircraft carriers, oceans and land and other scenarios, covering aircraft targets such as bombers, fighter jets, and early warning aircraft at typical airports in China and the United States. Results In order to test the algorithm improvement effect of the two improved modules on remote sensing aircraft recognition based on YOLOv5s, this article compares the model performance of the original YOLOv5s with the introduction of NWD (r is the weight parameter used to adjust the ratio of IoU and NWD) and GD. It can be seen from Table 2 that the introduction of NWD and GD can improve the recognition accuracy to varying degrees, and the improvements are effective. Among them, when the ratio of IoU to NWD is 1:1, the recognition effect of the data set MAR20 is the best; when the ratio of IoU to NWD is 1:9, the recognition effect of the data set CORS-ADD is the best. The experimental results show that: for the data set MAR20, compared with YOLOv5s, YOLOv8s and Gold-YOLO, mAP of improved YOLOv5s increased by 1.1%, 0.7% and 1.8% respectively; for the data set CORS-ADD, mAP increased by 0.6%, 1.7% and 3.9%. Conclusion In order to solve the problems of large target size differences and high intra-class similarity in the process of remote sensing aircraft image recognition, an improved YOLOv5s network is proposed. On the basis of YOLOv5s, the loss function of YOLOv5s is improved by combining the Gaussian Wasserstein distance with the traditional IoU metric, which improves the detection effect of targets of different sizes, thereby improving the detection accuracy of the model. At the same time, in order to solve the problem that the characteristics of different types of aircraft are similar and it is difficult to distinguish between sub-categories, this article uses the Gatherand-Distribute feature aggregation module in Gold-YOLO to enhance the ability of the YOLOv5s network to extract fine-grained features. Comparing the improved YOLOv5s with YOLOv5s, YOLOv8s, Gold-YOLO and Faster-RCNN, the model detection accuracy of this article is the best. In order to improve the image processing speed of the model without reducing the accuracy of the model, and to reduce the consumption of computing resources as much as possible to achieve lightweight deployment in the future, this article will consider using the C3_DSConv network to replace the C3 network of the YOLOv5s detection part to improve the model Check speed and make it lightweight.
Keywords

订阅号|日报