基于长短程依赖特征金字塔的 YOLOv7-tiny改进方法

孙中彬; 胡帅; 张帆; 周勇

发布时间： 2025-01-16
摘要点击次数： 95
全文下载次数： 64
DOI:
| Volume | Number

基于长短程依赖特征金字塔的 YOLOv7-tiny改进方法

孙中彬¹, 胡帅¹, 张帆², 周勇¹(1.中国矿业大学;2.浪潮卓数大数据产业发展有限公司)

摘要

目的近年来，YOLOv7-tiny成为实时目标检测领域的常用方法，由于其轻量化网络架构设计和较少的参数量，整个训练过程在单个网络中进行，检测速度快且不需要使用滑动窗口或候选区域，在资源受限、实时性要求高的任务中表现优异。然而，YOLOv7-tiny在特征融合阶段存在相邻层特征融合时信息丢失和非相邻层特征信息差异两个问题。具体而言，YOLOv7-tiny在相邻层特征融合时使用传统的最近邻上采样方法，生成的特征图会出现锯齿状边缘，降低了特征图质量和表达能力。非相邻层特征差异问题则发生在YOLOv7-tiny使用特征金字塔双向融合的过程中，高层和低层的独特信息逐渐被“稀释”，导致特征提取和检测阶段的特征图包含有差异的尺度信息，这可能严重影响模型检测大尺度或小尺度物体的能力。方法为了解决上述两个问题，本文提出了一种长短程依赖特征金字塔网络LSRD-FPN (Long Short Range Dependency Feature Pyramid Network)，并基于该网络对YOLOv7-tiny方法进行改进。LSRD-FPN包括两个关键组成部分：局部短程依赖机制SRD (Short Range Dependency)和全局长程依赖机制LRD(Long Range Dependency)。局部短程依赖机制通过改进上采样方式和引入注意力机制，有效减少了特征融合过程中信息丢失的问题；全局长程依赖机制通过引入跨层连接模块，将主干网络的多尺度特征缩放、融合并分配到检测阶段的不同层级特征。LSRD-FPN不仅增强了模型的特征表达能力，而且提升了其在多尺度目标检测任务的性能表现。结果本文选用了两个不同场景和规模的数据集进行了实验。实验结果表明，相较于YOLOv7-tiny，本文方法分别取得了1.3mAP和0.5mAP的性能提升。此外，与参数量相当的YOLOv5-s和YOLOv8-n相比，在TDD数据集上分别提升2.6mAP和0.2mAP，在Cmudsodd数据集上分别提升2.1mAP和4.4mAP。结论本文提出的长短程依赖特征金字塔网络解决了YOLOv7-tiny在特征融合阶段存在的相邻层特征融合时信息丢失问题和非相邻层特征信息差异问题，提升了YOLOv7-tiny方法的检测性能，并优于两种参数量相当的方法YOLOv5-s和YOLOv8-n。

关键词

目标检测特征融合特征金字塔 YOLOv7-tiny 多尺度特征

An improved yolov7-tiny method based on long short range dependency feature pyramid network

sunzhongbin, hushuai¹, zhangfan², zhouyong¹(1.China University of Mining and Technology;2.Inspur Zhuoshu Big Data Industry Development Co., Ltd.)

Abstract

Objective In recent years, YOLOv7-tiny has become a commonly used method in the field of real-time object detection. Due to its lightweight network architecture design and fewer parameters, the entire training process is carried out in a single network with fast detection speed and no need to use sliding windows or region proposals, it performs well in tasks with limited resources and high real-time requirements. However, YOLOv7-tiny has two problems in the feature fusion stage: information loss in adjacent layer feature fusion and differences in non-adjacent layer feature information. Specifically, YOLOv7-tiny uses the traditional nearest neighbor upsampling method in adjacent layer feature fusion, which may lead to jagged edges in the generated feature map, reducing the quality and expression ability of the feature map. The problem of non-adjacent layer feature differences occurs during the bidirectional fusion process of YOLOv7-tiny using feature pyramids. The unique information of upper and lower layers is gradually "diluted", resulting in feature maps containing different scale information in the feature extraction and detection stages, which may seriously affect the model's ability to detect large-scale or small-scale objects. Method To solve the above two problems, this paper proposes a Long Short Range Dependency Feature Pyramid Network, LSRD-FPN, which will be employed to improve the YOLOv7-tiny method. LSRD-FPN consists of two key components: the local Short Range Dependency mechanism (SRD) and global Long Range Dependency mechanism (LRD). SRD improves the upsampling method and introduces attention mechanism, using the lightweight feature upsampling method CARAFE instead of the traditional nearest neighbor upsampling method, with only increase of about 20000 parameters. In addition, adding a non-parametric attention mechanism SimAM after local feature fusion aims to enhance feature representation and enhance perceptual range, which effectively reduces the problem of information loss during the feature fusion process. LRD is inspired by the ResNet and Libra R-CNN models by introducing cross layer connection modules. In this study, multi-scale feature maps of different resolutions in the backbone network are scaled and adjusted to the same scale, and then fused and assigned to different levels in the detection stage. The extreme scale object feature information of the backbone network is directly input into the detection stage. This improvement not only enhances the model's feature expression ability, but also improves its performance in multi-scale object detection tasks. Result The training process of this study is conducted under the Ubuntu 20.04.4LTS operating system, with a GPU configured as an NVIDIA RTX 3090 and a graphics memory size of 24GB. The input image is fixed to 640x640, the batch size is set to 16, and 100 epochs are trained. Other parameter settings are set using the default YOLOv7-tiny settings. The method proposed in this study is compared on two datasets with different scenarios and quantities, namely the Traffic Detection Dataset TDD and the Coal mine underground drilling site object detection dataset Cmudsodd. This experiment uses YOLOv7-tiny as the benchmark and embeds LSRD-FPN into the YOLOv7-tiny. After 100 epochs of training, the experimental results show that the method achieves performance improvements of 1.3mAP and 0.5mAP compared to the benchmark model YOLOv7-tiny on TDD and Cmudsodd datasets, respectively. It is encouraging that despite significant performance improvements, the number of parameters remains at a relatively low level. This study conducts ablation experiments on two sub models of LSRD-FPN, LRD and SRD. The local Short Range Dependency mechanism improved 0.6mAP and 0.2mAP on the TDD and Cmudsodd datasets. The global Long Range Dependency mechanism improves 0.7mAP and 0.3mAP on the TDD and Cmudsodd datasets, respectively. Compared with other real-time object detection algorithms with the same number of parameters, the algorithm proposed in this study improves the TDD dataset by 2.6mAP compared to YOLOv5-s and 0.2mAP compared to YOLOv8-n. Compared to the above two algorithms, the Cmudsodd dataset shows an improvement of 2.1mAP and 4.4mAP. In addition, the FPS of the model proposed in this study is higher than 160, which meets the requirements of real-time detection tasks. This indicates that the method proposed in this paper not only improves performance, but also has the advantage of rapid deployment, which can be more quickly applied to practical scenarios. Conclusion The LSRD-FPN method proposed in this study can effectively improve the detection performance of the object detection model, with fewer parameters and floating-point operations to ensure that the model meets the requirements of real-time detection speed. In addition, LSRD-FPN can be applied not only to the YOLOv7-tiny model, but also to other object detection models. Due to the plug and play nature of LSRD-FPN, it can be easily deployed to other object detection models and bring performance improvements.

Keywords

object detection, feature fusion, feature pyramid, yolov7-tiny, multiscale features

在线采编平台

论文出版

年度会议

下载中心

年度信息