融合非临近跳连与多尺度残差结构的小目标车辆检测
摘 要
目的 基于深度卷积神经网络的目标检测模型易受复杂环境(遮挡、光照、远距离、小目标等)影响导致漏检、误检和目标轮廓特征模糊的问题,现有模型难以直接泛化到航拍场景下的小目标检测任务。为有效解决上述问题,提出一种融合非临近跳连与多尺度残差结构的小目标车辆检测算法(non-adjacent hop network you only look once version 5s multi-scale residual edge contour feature extraction strategy,NHN-YOLOv5s-MREFE)。方法 首先,设计4种不同尺度的检测层,根据自身感受野大小,针对性地负责不同尺寸车辆的检测。其次,借鉴DenseNet密集跳连的思想,构建一种非临近跳连特征金字塔结构(non-adjacent hop network,NHN),通过跳连相加策略,在强化非临近层次信息交互的同时融合更多未被影响的原始信息,解决位置信息在传递过程中被逐渐稀释的问题,有效降低了模型的误检率。然后,以减少特征丢失为前提,引入反卷积和并行策略,通过参数学习实现像素填充和突破每1维度信息量的方式扩充小目标细节信息。接着,设计一种多尺度残差边缘轮廓特征提取策略(multi-scale residual edge contour feature extraction strategy,MREFE),遵循特征逐渐细化的原则,构建多尺度残差结构,采用双分支并行的方法捕获不同层级的多尺度信息,通过多尺度下的高语义信息与初始浅层信息的逐像素作差实现图像边缘特征提取,进而辅助网络模型完成目标分类。最后,采用K-Means++算法使聚类中心分散化,促使结果达到全局最优,加速模型收敛。结果 实验结果表明,非临近跳连的特征金字塔与多尺度残差结构的多模态融合策略,在提升模型运行效率,降低模型计算资源消耗的同时,有效提升了小目标检测的准确性和鲁棒性。通过多场景、多时段、多角度的样本数据增强,强化了模型在不同场景下的泛化能力。最后,在十字路口、沿途车道双场景下包含多种车辆类型的航拍图像数据集上,对比分析4种主流的目标检测方法,本文算法的综合性能最优。相较于基准模型(YOLOv5s),精确率、召回率和平均精度均值分别提升了13.7%、1.6%和8.1%。结论 本文算法可以较好地平衡检测速度与精度,以增加极小的参数量为代价,显著地提升了检测精度,并能够自适应复杂的交通环境,满足航拍场景下小目标车辆检测的实时性需求,在交通流量、密度等参数的测量和统计,车辆定位与跟踪等场景下有较高的应用价值。
关键词
Small-target vehicle detection by fusing non-adjacent hopping and multi-scale residual structures
Zhang Hao1,2, Dong Kailong2, Gao Shangbing2, Liu Bin2, Hua Qifan2, Zhang Ge2(1.School of Transportation Engineering, Huaiyin Institute of Technology, Huaian 223003, China;2.School of Computer and Software Engineering, Huaiyin Institute of Technology, Huaian 223003, China) Abstract
Objective Target detection models based on deep convolutional neural networks are susceptible to complex environments(e. g., occlusion, illumination, long distance, and small targets), thereby leading to missed detection, false detection, and blurred target contour features. Moreover, the existing models cannot be easily generalized to small target detection tasks in aerial photography scenarios. To effectively solve these problems, this paper proposes a small-target vehicle detection algorithm called non-adjacent hop network you only look once version 5s multi-scale residual edge contour feature extraction strategy(NHN-YOLOv5s-MREFE)that fuses non-adjacent hopping and multi-scale residual structures. Method First, four different scales of detection layers are designed, which are targeted to be responsible for the detection of vehicles of different sizes according to their perceptual field size. Second, drawing on the idea of DenseNet dense hopping, a non-adjacent hopping feature pyramid structure is constructed, and through the hopping summing strategy, additional unaffected original information is fused while strengthening the information interaction of non-adjacent layers, thereby addressing the problem where the location information is gradually diluted during the transmission process and effectively reducing the false detection rate of the model. Third, under the premise of reducing feature loss, a deconvolution and parallelism strategy is introduced to expand small target detail information by means of parameter learning to achieve pixel filling and to break the amount of information in each dimension. Fourth, a multi-scale residual edge contour feature extraction strategy is designed to follow the principle of gradual feature refinement, build a multi-scale residual structure, and capture multi-scale information at different levels using a two-branch parallel approach. Fifth, a multi-scale residual structure is constructed following the principle of gradual feature refinement. This structure captures multi-scale information at different levels using a two-branch parallel approach, achieves image edge feature extraction based on the pixel-by-pixel difference between the high semantic information and the initial shallow information at multiple scales, and assists the network model in completing target classification. Finally, the K-Means++ algorithm is used to decentralize the clustering centers to drive the results to the global optimum and accelerate the convergence of the model. Result Experimental results show that the multimodal fusion strategy of the non-adjacent hopping and multi-scale residual structures effectively improves the accuracy and robustness of small target detection while enhancing the model operation efficiency and reducing the consumption of the model computational resources. The generalization ability of the model in different scenarios is strengthened through the enhancement of sample data in multiple scenarios, time periods, and perspectives. Finally, NHN-YOLOv5s-MREFE outperforms four mainstream target detection methods on an aerial image dataset containing multiple vehicle types in dual scenarios of intersections and along lanes. Compared with the benchmark model (YOLOv5s), the Precision, Recall, and mean average precision of NHN-YOLOv5s-MREFE are improved by 13. 7%, 1. 6%, and 8. 1%, respectively. Conclusion The proposed NHN-YOLOv5s-MREFE can balance detection speed and accuracy and significantly improve detection accuracy at the cost of increasing the number of parameters by a very small amount. This algorithm can also adapt to complex traffic environments to meet the real-time requirements of small target vehicle detection in aerial photography scenarios.
Keywords
intelligent transportation target detection deep learning non-adjacent hopping multi-scale residual structure
|