Current Issue Cover
基于高空无人机平台的多模态跟踪数据集

肖云1, 曹丹2, 李成龙1, 江波2, 汤进2(1.安徽大学人工智能学院;2.安徽大学计算机科学与技术学院)

摘 要
目的 无人机因易操纵、灵活等特点,近年来在军事和民用等多个领域得到广泛应用。相对于低空无人机,高空无人机具有更广的视野,更强的隐蔽性,在情报侦察、灾害救援等方面具有更高的应用价值。然而,现有无人机多模态目标跟踪研究主要是针对低空无人机,缺乏高空无人机多模态目标跟踪数据集,限制了该领域的研究和发展。方法 为此,本文构建了一个用于评估高空无人机多模态目标跟踪方法的数据集HiAl(high altitude UAV multi-modal tracking dataset),该数据集主要由搭载混合传感器的无人机在500米高空拍摄的可见光-红外多模态视频构成,两种模态数据经过精确配准和帧级标注,可以较好地评估不同多模态目标跟踪方法在高空无人机平台下的性能表现。结果 将主流的12种多模态跟踪方法在所提数据集与非高空无人机场景数据集上的表现进行了比较,方法TBSI(template-bridged search region interaction)在数据集RGBT234(RGB-Thermal dataset)上PR(precision rate)值达到0.871,而在本文所提数据集中仅0.527,下降了39.5%,其SR(success rate)值由数据集RGBT234上的0.637,到本文所提数据集上的0.468,下降了26.5%。方法HMFT(hierarchical multi-modal fusion tracker)在所提数据集上的PR与RGBT234相比下降了23.6%,SR下降了14%。此外,利用数据集HiAl对6个方法进行了重新训练实验,所有重训练方法的性能均得到提升。结论 本文提出一个基于高空无人机平台的多模态目标跟踪数据集,旨在促进多模态目标跟踪在高空无人机上的应用研究。HiAl数据集的在线发布地址为:https://github.com/mmic-lcl/Datasets-and-benchmark-code/tree/main。
关键词
A benchmark dataset for high altitude UAV multi-modal tracking

(School of Artificial Intelligence,Anhui University)

Abstract
Objective Unmanned aerial vehicles (UAVs) have become crucial tools in both modern military and civilian contexts, owing to their flexibility and ease of operation. High-altitude UAVs provide unique and distinct advantages over low-altitude UAVs, such as wider fields of view and stronger concealment, making them highly valuable in intelligence reconnaissance and emergency rescue and disaster relief tasks. However, tracking objects with high-altitude UAVs introduces significant challenges, including UAV rotation, tiny objects, complex background changes and low object resolution. The current research on multi-modal object tracking of UAVs primarily focuses on low-altitude UAVs, such as the dataset named VTUAV(visible-thermal UAV) of multi-modal object tracking of UAVs, which is shot in low-altitude airspace of 5-20 meters and can fully show the unique perspective of UAVs. However, the scenes captured by high-altitude UAVs significantly differ from those captured by low-altitude UAVs. Thus, this dataset can hardly provide strong support for the development of high-altitude UAVs multi-modal object tracking field, which is also the bottleneck of the lack of data support in the research field of multi-modal object tracking of high-altitude UAVs. Due to the lack of an evaluation dataset to evaluate the multi-modal object tracking method of high-altitude UAVs, this limitation hinders the research and development in this field. Method Therefore, this paper proposes an evaluation dataset named HiAl specifically for multi-modal object tracking methods of high-altitude UAVs, captured at approximately 500 meters. The UAV shooting this dataset is equipped with a hybrid sensor, which can capture video in both visible and infrared modes. In order to provide a higher level of ground truth annotation and evaluate different multi-modal object tracking methods more fairly, we registered the collected multi-modal videos with high quality. To be specific, we first manually align the two modalities of video to ensure that the same tracking object in each pair of videos occupies the same position within the frame. This is because during the registration process, ensuring accurate registration of the area where the tracking object is located is the top priority, and under this premise, other areas in the image will also be roughly aligned. Then, based on high alignment of two modalities, we provide accurate ground truth annotation for each frame of the video. We use the horizontal annotation boxes to label the position of the target in a way that best fits the contour of the tracked object. Under the above modal alignment, two modalities of video can share the same ground truth, which allows better evaluation of different multi-modal object tracking methods under the high-altitude UAVs platform. To ensure the diversity and authenticity of the dataset, we comprehensively considered tracking attributes, scenes, and object categories during the data collection process. The dataset considers different lighting conditions and weather factors, including night and foggy days, for 9 common object categories in high-altitude UAV scenes. There are 12 tracking attributes in the dataset, of which two are unique to UAVs, which have rich practical significance and high challenges. Different from the existing multi-modal tracking dataset, this dataset tracks mostly small targets, which is also a realistic challenge brought by high-altitude UAV shooting. Result The performance of 10 mainstream multi-modal tracking methods on this dataset is compared with that on non-high altitude UAV scene dataset. This study employs common quantitative evaluation metrics, namely the precision rate (PR) and success rate (SR), to assess the performance of each method. Taking the two outstanding methods as examples, the PR value of template-bridged search region interaction method (TBSI) on the RGB-Thermal dataset (RGBT234) reached 0.871, while it was only 0.527 on the dataset proposed in this paper, which decreased by 39.5%, and its SR value decreased from 0.637 on RGBT234 to 0.468 on the dataset proposed in this paper, which decreased by 26.5%. Compared with RGBT234, the PR and SR of the hierarchical multi-modal fusion tracker (HMFT) on the dataset HiAl decreased by 23.6% and 14% respectively. In addition, we use the dataset HiAl to retrain 6 of these methods. After comparison, it is found that the performance of all the retraining methods has been improved. For example, the PR value of duality-gated mutual condition network (DMCNet) is increased from 0.485 before training to 0.524, and the SR value is increased from 0.512 before training to 0.526. These experimental results reflect the high challenge and necessity of the dataset. Conclusion In summary, this paper introduces an evaluation dataset designed to assess the performance of multi-modal object tracking methods for high-altitude UAVs. In order to provide a dedicated dataset for high-quality multi-modal tracking of high-altitude UAV, we carefully register the multi-modal data collected in the real scene and provide frame-level ground truth annotation. This proposed dataset HiAl can serve as a standard evaluation tool for future research, offering researchers access to authentic and varied data to evaluate their algorithms" performance. By comparing the experimental results of 10 mainstream tracking algorithms in HiAl with other datasets, and the experimental results of retraining 6 tracking algorithms, the limitations of existing algorithms in multi-modal object tracking task of high-altitude UAVs are analyzed, and potential research directions are extracted for researchers" reference. The HiAl dataset is available at https://github.com/mmic-lcl/Datasets-and-benchmark-code/tree/main.
Keywords

订阅号|日报