Current Issue Cover
结合时空一致性的FairMOT跟踪算法优化

彭嘉淇, 王涛, 陈柯安, 林巍峣(上海交通大学电子信息与电气工程学院, 上海 201100)

摘 要
目的 视频多目标跟踪(multiple object tracking,MOT)是计算机视觉中的一项重要任务,现有研究分别针对目标检测和目标关联部分进行改进,均忽视了多目标跟踪中的不一致问题。不一致问题主要包括3方面,即目标检测框中心与身份特征中心不一致、帧间目标响应不一致以及训练测试过程中相似度度量方式不一致。为了解决上述不一致问题,本文提出一种基于时空一致性的多目标跟踪方法,以提升跟踪的准确度。方法 从空间、时间以及特征维度对上述不一致性进行修正。对于目标检测框中心与身份特征中心不一致,针对每个目标检测框中心到特征中心之间的空间差异,在偏移后的位置上提取目标的ReID(re-identification)特征;对帧间响应不一致,使用空间相关计算相邻帧之间的运动偏移信息,基于该偏移信息对前一帧的目标响应进行变换后得到帧间一致性响应信息,然后对目标响应进行增强;对训练和测试过程中的相似度度量不一致,提出特征正交损失函数,在训练时考虑目标两两之间的相似关系。结果 在3个数据集上与现有方法进行比较。在MOT17、MOT20和Hieve数据集中,MOTA(multiple object tracking accuracy)值分别为71.2%、60.2%和36.1%,相比改进前的FairMOT算法分别提高了1.6%、3.2%和1.1%。与大多数其他现有方法对比,本文方法的MT(mostly tracked)比例更高,ML(mostly lost)比例更低,跟踪的整体性能更好。同时,在MOT17数据集中进行对比实验验证融合算法的有效性,结果表明提出的方法显著改善了多目标跟踪中的不一致问题。结论 本文提出的一致性跟踪方法,使特征在时间、空间以及训练测试中达成了更好的一致性,使多目标跟踪结果更加准确。
关键词
Spatio-temporal consistency based FairMOT tracking algorithm optimization

Peng Jiaqi, Wang Tao, Chen Kean, Lin Weiyao(School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 201100, China)

Abstract
Objective Video-based multiple object tracking is one of the essential tasks in computer vision like automatic driving and intelligent video surveillance system.Most of the multiple object tracking methods tend to obtain object detection results first.The integrated strategies are used to link detection bounding boxes and form object trajectories.Current object detection contexts have been developing recently.But,the challenging inconsistency issues are required to be resolved in multiple object tracking,which affected the multi-objects tracking accuracy.The multi-objects tracking inconsistency can be classified into three types as mentioned below:1) the inconsistency between the centers of the object bounding boxes and those object identity features.Many multiple object tracking methods are extracted the object re-identification (ReID) features at the object bounding boxes centers and these features are used to in associate with objects.However,those oriented ReID features are incapable to reflect the appearance of objects accurately due to the occlusion.The offsets are appeared between the best ReID feature extraction positions and bounding box centers.Current feature extraction strategy will lead to the spatial consistency problem.2) The inconsistency of the object center response between consecutive frames.Some objects can be detected and tracked in the contexted frames due to the occlusion in videos.It causes consecutive frames loss and the inconsistency between the object-center-responsed heatmaps of two consecutive frames.3) The inconsistency of the similarity assessment in the training process and testing process.Most of association step is considered as a classification problem and the cross entropy loss is used to train the model while the inter-object relations are ignored in the testing process.The feature cosine similarities of each pair of objects are used to associate them.To improve the accuracy of tracking,we facilitate a multiple object tracking method based on consistency optimization.Method These inconsistencies issues are validated based on spatial,temporal and featured scales.In view of the inconsistency between the centers of the object detection bounding boxes and the identity features,we predict the offsets from the centers of the detection bounding boxes to the feature centers for each object.To predict the best ReID feature extraction positions,we use the object centers and the offsets as well.We extract the ReID features of objects at those predicted positions and use these features to reflect objects.In view of the inconsistency of the response between frames,the spatial correlation module is used to calculate the offset information between adjacent frames.Based on the offset information,the object center response of the previous frame is transformed by deformable convolution to obtain the inter-frame consistency response information,which is enhanced to the current frame.To resolve the inconsistency of similarity measures in training and test process,we develop a feature orthogonal loss function,which considers the similarity relationship between the two objects during training.To detect and track objects,we integrate these three improved consistency results with FairMOT method.Result The performance of our method is compared to existing methods on three datasets.The comparative results are illustrated as following:1) our multiple object tracking accuracy (MOTA) value is 71.2% on the MOT17 dataset,which is increased by 1.6% in comparison with the FairMOT method without consistency improvement;2) the MOTA value is 60.2% on the MOT20 dataset,which is increased by 3.2%;3) the MOTA value is 36.1% on the Hieve dataset,which is increased by 1.1%.At the same time,we conduct ablation studies on MOT17 dataset to verify the effectiveness of different components in our method,which shows that the proposed method improves the consistency in multiple objects tracking significantly.In ablation studies,we find that the identity switch numbers are decreased via the added ReID feature extraction position offsets and the feature orthogonal loss function.The model-based extraction position offsets can get the object appearance features at the right positions and the feature orthogonal loss function can learn the object appearance features in the right way.We also visualize the predicted ReID feature extraction positions and object bounding boxes centers,and the visualization results show that our predicted positions are closer to the object appearance features rather than the physical centers,which is feasible to the extraction position offsets.Conclusion Our multiple objects tracking method can achieve the spatio-temporal consistency of object features better in training and testing,which makes the model track objects more accurately.
Keywords

订阅号|日报