融合姿态引导和多尺度特征的遮挡行人重识别
摘 要
目的 在行人重识别任务中,行人外观特征会因为遮挡发生变化,从而降低行人特征的辨别性,仅基于可视部分的传统方法仍会识别错误。针对此问题,提出了一种融合姿态引导和多尺度特征的遮挡行人重识别方法。方法 首先,构建了一种特征修复模块,根据遮挡部位邻近信息恢复特征空间中被遮挡区域的语义信息,实现缺失部位特征的修补。然后,为了从修复的图像中提取有效的姿态信息,设计了一种姿态引导模块,通过姿态估计引导特征提取,实现更加精准的行人匹配。最后,搭建了特征增强模块,并融合显著性区域检测方法增强有效的身体部位特征,同时消除背景信息造成的干扰。结果 在3个公开的数据集上进行了对比实验和消融实验,在Market1501、DukeMTMC-reID(Duke multi-tracking multi-camera re-identification)和Occluded-DukeMTMC(occluded Duke multi-tracking multi-camera re-identification)数据集上的平均精度均值(mean average precision, mAP)和首次命中率(rank-1 accuracy, Rank-1)分别为88.8%和95.5%、79.2%和89.3%、51.7%和60.3%。对比实验结果表明提出的融合算法提高了行人匹配的准确率,具有较好的竞争优势。结论 本文所提的姿态引导和多尺度融合方法,修复了因遮挡而缺失的部位特征,结合姿态信息融合了不同粒度的图像特征,提高了模型的识别准确率,能有效缓解遮挡导致的误识别现象,验证了方法的有效性。
关键词
Pose guidance and multi-scale feature fusion for occluded person re-identification
Zhang Hongying1, Liu Tengfei1, Luo Qian2, Zhang Tao2(1.College of Electronic Information and Automation, Civil Aviation University of China, Tianjin 300300, China;2.Civil Aviation Electronic Technology Co., Ltd., Chengdu 610041, China) Abstract
Objective Person re-identification (ReID) is an important task in computer vision, and it aims to accurately identify and associate the same person between multiple visual surveillance cameras by extracting and matching features of pedestrian under different scenarios. Occluded person ReID is a challenging and specialized task in the existing person ReID problems. In real-world settings, occlusion is a common issue, and it impacts the practical application of person ReID technique to a certain extent. Recently, occluded person ReID has gradually attracted the attention of many researchers, and several methods have been proposed to address the issue of occlusion, which achieve impressive results. Currently, these methods primarily focus on the visible regions in images. Concretely, it first locates the visible regions in the image and then specially designs a model to extract discerning feature information from these regions, which achieves accurate person matching. These methods typically remove features coming from the occluded areas and then exploit discriminative features from the non-occluded regions for matching. Although these methods achieve impressive results, the influence of occluded regions and background interference in images are ignored, which results in the aforementioned solutions failing to effectively address the misclassification issue resulting from similar appearances in non-occluded regions. Consequently, merely relying on visible regions for subsequent recognition task leads to a sharp performance drop of the model, and the interference coming from image backgrounds also affects the further improvement in recognition accuracy. Some methods have been proposed to recover the occluded regions in images for overcoming the abovementioned issues. Specifically, these methods restore the occluded parts by utilizing the unobstructed image information at the image level. However, the restoration approaches may cause image distortion and introduce an excessive number of parameters. Method We propose a person ReID method based on pose guidance and multi-scale feature fusion to alleviate the aforementioned issues. This method can enhance the feature representation capability of the model and obtain more discriminative features. First, a feature restoration module is constructed to restore the occluded image features at the feature level while effectively reducing the parameters of the model. The module uses spatial contextual information from the non-occluded regions to predict the features of adjacent occluded regions, which restores the semantic information of the occluded regions in the feature space. The feature restoration module mainly consists of two subparts: the adaptive region division unit and the feature restoration one. The adaptive region division unit divides the image into six regions adaptively according to the predicted localization points to facilitate the clustering of similar feature information in different regions. The adaptive division in the module could effectively alleviate the misalignment caused by fixed division methods, and it could achieve more accurate position alignment. The feature restoration unit comprises of an encoder and a decoder. The encoder encodes the feature information coming from the divided regions of the image with similar appearances or close positions into a cluster. Meanwhile, the decoder assigns the cluster information to the occluded body parts in the image, which completes the feature restoration of missing body parts. Second, a pose estimation network is employed to extract pedestrian pose information. The pose estimation network is responsible for guiding the generation of keypoint heatmaps for the restored complete image features. Then, it implements the prediction of body keypoints with the heatmaps to obtain pose information. The pretrained pose estimation guidance model performs fusion learning on the global non-occluded regions and the restored regions to obtain more distinctive pedestrian feature information for more accurate pedestrian matching. Finally, a feature enhancement module is proposed to extract salient features from the image for eliminating the interference coming from background information while enhancing the learning capability for effective information. This module not only makes the network pay close attention to the valid semantic information in the feature maps but also reduces the interference coming from background noises, which could effectively alleviate the failure of feature learning caused by occlusion. Result We conducted several comparative experiments and ablation experiments on three publicly available datasets to validate the effectiveness of our method. We employed mean average precision (mAP) and Rank-1 accuracy as our evaluation metrics. Experiment results demonstrate that our method achieves mAP and Rank-1 of 88.8% and 95.5% on the Market1501 dataset, respectively. The mAP and Rank-1 are 79.2% and 89.3%, respectively, on the Duke multi-tracking multi-camera ReID (DukeMTMC-reID) dataset. On the occluded Duke multi-tracking multi-camera re-recognition (Occluded-DukeMTMC) dataset, the mAP and Rank-1 can reach 51.7% and 60.3%, respectively. Moreover, our method outperforms the PGMA-Net by 0.4% in mAP on the Market1501 dataset, by 0.8% in mAP and 0.7% in Rank-1 on the DukeMTMC-reID dataset, and by 1.2% in mAP on the Occluded-DukeMTMC dataset. At the same time, the ablation experiments confirm the effectiveness of the three proposed modules. Conclusion Our proposed method, pose-guided and multi-scale feature fusion (PGMF), could effectively recover the features of missing body parts, alleviate the issue of background interference, and achieve accurate pedestrian matching. Therefore, the proposed model effectively alleviates the misidentification caused by occlusion, improves the accuracy of person ReID, and exhibits robustness.
Keywords
|