Current Issue Cover
  • 发布时间: 2025-01-16
  • 摘要点击次数:  11
  • 全文下载次数: 2
  • DOI:
  •  | Volume  | Number
自适应切片辅助增强的小物体目标检测

郝川艳, 金怡, 何桂琴, 张昊, 祝翔祺, 宋婉茹(南京邮电大学)

摘 要
目的 在目标检测领域,深度学习模型已经取得了巨大的成功。但是,已有的基于深度学习的目标检测算法在小物体目标检测中仍然困难重重。究其原因在于,航拍图像多是更复杂的高分辨率场景,其中一些常见的问题,如稠密度高、不固定的拍摄角度、目标物体尺寸小和高变异性等,给现有目标检测方法带来了巨大的挑战。为了提高性能,切片策略是近年来用于高分辨率图像小目标检测任务的众多优秀方法之一。然而,现有的切片方法存在冗余计算问题。方法 因此,本文提出了一种新的自适应切片方法,称为自适应切片辅助超推断(adaptive slicing-assisted hyper inference,ASAHI)。该方法关注的是切片数量而不是传统的切片大小;它可以根据图像分辨率自适应调整切片数量以消除冗余计算带来的性能损耗。为此,在推理阶段,本文工作首先根据ASAHI算法将输入图像分割为6或12个重叠的块;之后,对每个图像块进行插值处理以保持长宽比;接下来,考虑到小块切片在检测大物体时的明显缺陷,本方法分别对切片图像块和完整输入图像进行目标检测前向计算;最后,为了提高高密度场景下推理的准确性和检测速度,后处理阶段集成了一种更快和高效的Cluster-NMS (Non-Maximum Suppression,NMS)方法和DIoU惩罚项(Distance- Intersection over Union,DIoU),即Cluster-DIoU-NMS,将ASAHI推理和全图推理结果进行合并,再转回原始图像大小。为了支持切片图像块的推理,对应地,本文在训练阶段构建的数据集也包括切片图像块。结果 在广泛的实验中,ASAHI在VisDrone(vision meets drones)和xView数据集上表现出具有竞争力的性能。结果显示,与现有切片方法相比,本文方法将IoU值为0.5时的平均精确率均值mAP50(mean average precision,mAP)提高了1.7%,计算时间减少了20-25%;在VisDrone2019-det-val(vision meets drones2019 for detection for validation)数据集上,mAP50的结果提高到了56.8。结论 可见,本文提出的算法可以有效地处理高分辨率场景下小物体稠密度高、拍摄角度不同、变异性高等复杂的因素,实现高质量的小物体目标检测。
关键词
An Adaptive Slicing-Assisted Enhanced Method for Small Object Detection

HAO Chuanyan, JIN YI, HE Guiqin, ZHANG Hao, ZHU Xiangqi, SONG Wanru(Department of digital media,College of Educational Science and Technology,Nanjing University of Posts and Telecommunications,Nanjing,Jiangsu)

Abstract
Objective Object detection has been paid much attention because of its wide application in various fields. In recent years, with the progress of deep learning technology, object detection algorithms combined with deep convolutional neural networks have been greatly developed. In natural scenes, traditional object detectors have achieved excellent results. However, current object detection algorithms are still difficult in small object detection. The reason is that most aerial images refer to complex high-resolution scenes, and some common problems, such as high density, unfixed shooting angle, small size and high variability of targets, have brought great challenges to existing object detection methods. In order to address these issues, the research of small object detection has become one of the hot topics in the field of object detection. Its broad applications include early small lesions and masses identification in medical imaging, military remote sensing exploration, small defect location in industrial production, and so on. Some researchers obtain high-resolution image features by using up-sampling operations several times; another kind of approaches can effectively deal with problems such as high intensity by adding a penalty item in the post-processing stage. Among them, one excellent work is the use of slicing strategy, which slices the image into smaller image blocks in order to enlarge the receptive field. But the existing slicing-based methods have the problem of redundant computation that increases the calculation cost and reduces the detection speed. Method Therefore, a new adaptive slicing method, called adaptive slicing-aided hyper inference (ASAHI), is proposed in this paper. This approach focuses on the number of slices rather than the traditional slice size; it can adaptively adjust the number of slices according to the image resolution to reduce the performance loss caused by redundant calculation. Specifically speaking, in the inference stage, the work first divides the input image into 6 or 12 overlapping patches using the ASAHI algorithm; then, it interpolates each image patch to maintain the aspect ratio; next, considering the obvious defects of the slicing strategy in detecting large objects, this method separately performs forward computation on the sliced image patches and the complete input image; finally, to improve the accuracy and detection speed in high-density scenes, the post-processing stage integrates a faster and efficient Cluster-NMS method and DIoU penalty term, namely Cluster-DIoU-NMS, to merge the ASAHI inference and full-image inference results and then resize them back to the original image size. In order to support the ASAHI inference, correspondingly, the dataset constructed in the training stage also includes slice image blocks. The dataset of slicing images and the pre-training dataset of the whole images together constitute the fine-tuning dataset for the training of this work. It would be mentioned that the slicing method used in the fine-tuning dataset can be either the ASAHI algorithm or the conventional sliding window method. In the ASAHI slicing process, this method sets a distinction threshold T to control the number of slices p. If the length or width of the image exceeds this threshold, then the image will be cut into 4×3 total 12 slices, otherwise it will be cut into 3×2 total 6 slices. After that, the width and height of the slice block are calculated according to the value of p, and the coordinate position of the slice is determined. After the above calculation, the ASAHI algorithm realizes the adaptive adjustment of slice size within a limited range by controlling the number of slices p. Result Broad experiments demonstrate that ASAHI has competitive performance on VisDrone and xView datasets. The results show that the proposed method achieved the highest mAP50 scores (45.6 and 22.7) and fast inference speeds (4.88 images per second and 3.58 images per second) on both datasets. In addition, the mAP and mAP75 increased by 1.7% and 1.1% On the Visdrone2019-DET-test dataset, respectively, while the mAP and mAP75 improved by 1.43% and 0.9% respectively on the xView test set. On the VisDrone2019-dt-val dataset, the mAP50 of the experiment reaches to 56.8 higher. Compared with state of the art, the proposed method achieves the highest mAP (36.0), mAP75 (28.2) and mAP50 (56.8) values, with a highest processing speed of 5.26 images per second, showing a better balance performance. Conclusion It can be seen that the algorithm proposed in this paper can effectively handle the complex factors such as high density, different shooting angles, high variability, and so on, in high-resolution scenes, and achieve high-quality detection of small objects.
Keywords

订阅号|日报