Current Issue Cover
深度学习行人检测方法综述

罗艳1, 张重阳1, 田永鸿2, 郭捷3, 孙军1(1.上海交通大学电子信息与电气工程学院, 上海 200240;2.北京大学信息科学技术学院, 北京 100871;3.上海交通大学网络空间安全学院, 上海 200240)

摘 要
行人检测技术在智能交通系统、智能安防监控和智能机器人等领域均表现出了极高的应用价值,已经成为计算机视觉领域的重要研究方向之一。得益于深度学习的飞速发展,基于深度卷积神经网络的通用目标检测模型不断拓展应用到行人检测领域,并取得了良好的性能。但是由于行人目标内在的特殊性和复杂性,特别是考虑到复杂场景下的行人遮挡和尺度变化等问题,基于深度学习的行人检测方法也面临着精度及效率的严峻挑战。本文针对上述问题,以基于深度学习的行人检测技术为研究对象,在充分调研文献的基础上,分别从基于锚点框、基于无锚点框以及通用技术改进(例如损失函数改进、非极大值抑制方法等)3个角度,对行人检测算法进行详细划分,并针对性地选取具有代表性的方法进行详细结合和对比分析。本文总结了当前行人检测领域的通用数据集,从数据构成角度分析各数据集应用场景。同时讨论了各类算法在不同数据集上的性能表现,对比分析各算法在不同数据集中的优劣。最后,对行人检测中待解决的问题与未来的研究方法做出预测和展望。如何缓解遮挡导致的特征缺失问题、如何应对单一视角下尺度变化问题、如何提高检测器效率以及如何有效利用多模态信息提高行人检测精度,均是值得进一步研究的方向。
关键词
An overview of deep learning based pedestrian detection algorithms

Luo Yan1, Zhang Chongyang1, Tian Yonghong2, Guo Jie3, Sun Jun1(1.School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China;2.School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China;3.School of Cyber Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China)

Abstract
Computer vision technology has been intensively developed nowadays and it is essential to facilitate image classification and human face identification. Machine learning based methods have been used as basic technologies to carry out computer vision tasks. The core of this technology is to distinguish the location and category of the target via manual image feature designation for targeted tasks. However, the manual design process is costly. Current emerging deep learning-based technology can automatically learn effective features from labeled or unlabeled data in a supervised or unsupervised manner and facilitate image recognition and target detection tasks. Deep learning based pedestrian detection technology is one of the aspects its development. Our pedestrian detection is to identify pedestrian targets in a scenario of input single frame image or image sequence and determine the localization of the pedestrians in the targeted image. Due to the complicated scenarios and the uniqueness of pedestrian targets, deep learning based pedestrian detection technology has challenged two key issues shown below:1) one aspect is the occlusion issue. The other one is that, the human body structure information of pedestrians is severely affected in the case of severe occlusion. As a result, the visual features of the occluded pedestrians are differentiated from those of the un-occluded ones leading to false negatives during inference. Due to the diversity of occlusion patterns, it is challenged to analyze which part is occluded accurately, and locates on-site capability for pedestrian detection algorithms; 2) the other challenge is scale-based variance. The pedestrians' detection status is constrained of crowded or sparse scenario l. For a tiny target, due to the lack of sufficient semantic information, the detector is likely to misjudge it as background noise. Simultaneously, it is challenged for a set of clear anchors that can match it perfectly for a large-scale target during the training procedure. Moreover, large-scale pedestrian instances often have clear internal texture and skeleton features, while small-scale ones often only have blurred edge information. Therefore, a unified framework designation is required to for large and small targets both. Our research carries out an overview of related works on several of deep learning-based pedestrian detection algorithms. Our analysis is targeted on current improvement of the mainstream pedestrian detection framework from three aspects, including anchor-based algorithm, anchor-free algorithm and technology modification (e.g., loss function and non-maximum suppression). In the scope of anchor-based methods, this research is mainly focused on pedestrian detectors based on Faster region-based convolutional neural network (R-CNN) or single shot multi-box detector (SSD) baseline, in which region proposals are firstly to generate and refined to get the final detection subsequently. In the context of these algorithms, current designation is for customized pedestrian modules whether it is based on single-stage or two-stage anchor-based detectors. We summarize them into the categories as following:1) partial-based methods:local part features contain more pedestrian occlusion and deformation information, and thus some methods like occlusion-aware R-CNN (OR-CNN) have investigated to extract part-level features to improve occluded pedestrian detection performance. In addition to using extra part detectors or delineating partial regions manually, several pedestrian detection methods like mask-guided attention network(MGAN) use the attention mechanism to enhance the features of visible pedestrian regions while suppressing the features of occluded ones. 2) Hybrid methods:such methods like Bi-box or PedHunter built two-branch networks for both part and full-body prediction, and introduce a fusion mechanism to ensure more robustness on the aspects of local and global features of pedestrians both. 3) Cascaded methods:to improve localization quality, cascade structure has been also applied for pedestrian detection. Cascade R-CNN, auto regressive network(AP-Ped) and asymptotic localization fitting network(ALFNet) stacked multiple head predictors for multi-stage regressions of the proposals, and thus the pedestrian detection boxes can be gradually refined to obtain optimized localization results. 4) Multi-scale methods:these methods are integrated to robust feature representation by fusing high-level and low-level features like feature pyramid network (FPN) to tackle with scale variance in pedestrian detection. In the scope of anchor-free methods, our demonstration illustrates the two detectors like point-based, center scale predictor (CSP) and line-based, topology localization (TLL). Our two methods do not use the pre-defined anchor boxes and thus split into the anchor-free paradigm. These anchor-free methods can avoid the redundant background information brought by the pre-defined boxes, so it has relatively better performance for small-scale and occluded pedestrian detection. In addition, our research also summarizes improvements in general technologies that can be used in both anchor-based and anchor-free detectors. The modification of loss function represented by repulsion loss (RepLoss) is designed to bring the proposal and its matched ground-truth box closer while keeping it away from other ground-truth boxes. Another key technique is non-maximum suppression (NMS), which is usually used to reduce duplicated detection results. Representative methods among them are adaptive NMS and R2 NMS, and they usually aim to find a more suitable post-processing threshold for the pedestrian detector to deal with the occlusion issue. The regular datasets like Caltech, Citypersons and its corresponding challenging subsets (e.g., reasonable and heavy) are introduced in details. On the basis of the evaluation metric of log-average miss rate, our overview promotes a comparison of the performance on different subsets targeting at various challenging tasks, and provides an experimental analysis.
Keywords

订阅号|日报