融合全局与空间多尺度上下文信息的车辆重识别
摘 要
目的 车辆重识别指判断不同摄像设备拍摄的车辆图像是否属于同一辆车的检索问题。现有车辆重识别算法使用车辆的全局特征或额外的标注信息,忽略了对多尺度上下文信息的有效抽取。对此,本文提出了一种融合全局与空间多尺度上下文信息的车辆重识别模型。方法 首先,设计一个全局上下文特征选择模块,提取车辆的细粒度判别信息,并且进一步设计了一个多尺度空间上下文特征选择模块,利用多尺度下采样的方式,从全局上下文特征选择模块输出的判别特征中获得其对应的多尺度特征。然后,选择性地集成来自多级特征的空间上下文信息,生成车辆图像的前景特征响应图,以此提升模型对于车辆空间位置特征的感知能力。最后,模型组合了标签平滑的交叉熵损失函数和三元组损失函数,以提升模型对强判别车辆特征的整体学习能力。结果 在VeRi-776(vehicle re-idendification-776)数据集上,与模型PNVR (part-regularized near-duplicate vehicle re-identification)相比,本文模型的mAP (mean average precision)和rank-1(cumulative matching curve at rank 1)评价指标分别提升了2.3%和2.0%。在该数据集上的消融实验验证了各模块的有效性。在Vehicle ID数据集的大规模测试子集上,就rank-1和rank-5(cumulative matching curve at rank 5)而言,本文模型的mAP比PNVR分别提升了0.8%和4.5%。结论 本文算法利用全局上下文特征和多尺度空间特征,提升了拍摄视角变化、遮挡等情况下车辆重识别的准确率,实验结果充分表明了所提模型的有效性与可行性。
关键词
Global and spatial multi-scale contexts fusion for vehicle re-identification
Wang Zhenxue, Xu Zheming, Xue Yangyang, Lang Congyan, Li Zun, Wei Lili(School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China) Abstract
Objective Vehicle re-identification issue is concerned of identifying the same vehicle images captured from multiple cameras-based non-overlapping views. Its applications and researches have been developing in computer vision like intelligent transportation system and public traffic security. Current sensor-based methods are focused on hardware detectors utilization as a source of information inputs for vehicle re-identification, but these methods are challenged to get effective information of the vehicle features in related to its color, length, and shape. To obtain feature information about the vehicle, most methods are based on label-manual features in the context of edges, colors and corners. However, such special decorations are challenged to be identified on the aspects of camera view variation, low resolution, and object occlusion of the vehicle-captured images. Thanks to the emerging deep learning technique, vehicle re-identification methods have been developing dramatically. Recent vehicle re-identification methods can be segmented into two categories:1) feature learning and 2) ranges-metric learning. To enhance the re-identification performance, existing methods are restricted by multi-scale contextual information loss and lacking ability of discriminative feature selection because most feature learning and ranges-metric learning approaches are based on vehicle visual features from initial views captured or the additional information of multiple vehicles attributes, spatio-temporal information, vehicle orientation. So, we develop a novel global and spatial multi-scale contexts fusion method for vehicle re-identification (GSMC). Method Our method is focused on the global contextual information and the multi-scale spatial information for vehicle re-identification task. Specifically, GSMC is composed of two main modules:1) a global contextual selection module and 2) a multi-scale spatial contextual selection module. To extract global feature as the original feature, we use residual network as the backbone network. The global contextual selection module can be used to divide the original feature map into several parts along the spatial dimension, and the 1×1 kernel size convolution layer is employed for the dimension-reducing. The softmax layer is used to obtain the weight of each part, which represents the contribution of different parts to the vehicle re-identification task. To extract more discriminative information of vehicles, the feature-optimized is melted into original feature. Additionally, to obtain a more discriminative feature representation, the feature outputs are divided into multiple horizontal local features in this module and these local features are used to replace global feature classification learning. In order to alleviate the feature loss in the boundary area, local features-adjacent have an intersection with a length of 1. What is more, the multi-scale spatial contextual selection module for GSMC is introduced to obtain multi-scale spatial features via different down-sampling, and, to generate the foreground feature response map of the vehicle image, this selected module can be used to optimize those multi-scale features, which can enhance the perception ability of GSMC to the vehicle's spatial location. To enhance the effect of the foreground, an adaptive larger weight can be assigned to the vehicle. To select more robust spatial contextual information, a smaller weight is assigned to the background for alleviating the interference of background information. Finally, as the final feature representation of the vehicle, our approach can fuse the features in the context of the global contextual selection module and the multi-scale spatial contextual selection module. In order to obtain a fine-grained feature space, GSMC is used for the label-smoothed cross-entropy loss and the triplet loss to improve its learning-coordinated ability overall. In the training process, in order to make the model have a faster convergence rate, our model is implemented in the first 5 epochs to keep the model stable in terms of the warm-up learning strategy. Result To valid the effectiveness of our approach proposed on vehicle re-identification task, we evaluate our model with some state-of-the-art methods on two public benchmarks of those are VehicleID and vehicle re-idendification-776 (VeRi-776) datasets. The quantitative evaluation metrics are related to mean average precision (mAP) and cumulative matching curve (CMC), which can represent the probability that the image of the probe identity appears in the retrieved list. We carry out a series of comparative analysis with other methods, which are additional non-visual information and the multi-view leaning methods. Our analysis is demonstrated that it can surpass PNVR (part-regularized near-duplicate vehicle re-identification) by a large margin significantly. On the VehicleID dataset, we improve the rank-1 by 5.1%, 4.1%, 0.8% and the rank-5 by 4.4%, 5.7%, and 4.5% on three test subsets of different size. Compared to PNVR on the VeRi-776 dataset, GSMC gains 2.3% and 2.0% performance improvements of each in terms of mAP and rank-1. The lower ranks of CMC accuracy illustrates that our method can promote the ranking of rough multi-view captured vehicle images. Furthermore, we use re-ranking strategy as a post processing step over the VeRi-776 dataset and the results have significant improvement in mAP, rank-1 and rank-5 scores. At the same time, to verify the necessity of different modules in the proposed model, we design an ablation experiment to clarify whether a single branch can extract discriminative feature or not and the effectiveness of the feature fusion of the two modules is optimized as well. When different modules are added sequentially, the combination can realize the performance improvement by a large margin on mAP, rank-1 and rank-5. We are able to conclude that our proposed module is effective and can be capable to pull the images of same vehicle identity closer and push the different vehicles far away through the comparative analysis in relevant to the experimental results, the attention heat map visualization and the foreground feature response map. Conclusion To resolve the problem of vehicle re-identification, we develop an optimized model in terms of a global contextual selection module and a multi-scale spatial contextual selection module. The proposed model has its potential effectiveness in the extensive experiments in comparison with two popular public datasets mentioned.
Keywords
vehicle re-identification deep learning local discriminative features feature selection multi-scale spatial features
|