跨域遥感场景解译研究进展
摘 要
遥感对地观测中普遍存在多平台、多传感器和多角度的多源数据,为遥感场景解译提供协同互补信息。然而,现有的场景解译方法需要根据不同遥感场景数据训练模型,或者对测试数据标准化以适应现有模型,训练成本高、响应周期长,已无法适应多源数据协同解译的新阶段。跨域遥感场景解译将已训练的老模型迁移到新的应用场景,通过模型复用以适应不同场景变化,利用已有领域的知识来解决未知领域问题。本文以跨域遥感场景解译为主线,综合分析国内外文献,结合场景识别和目标识别两个典型任务,论述国内外研究现状、前沿热点和未来趋势,梳理总结跨域遥感场景解译的常用数据集和统一的实验设置。本文实验数据集及检测结果的公开链接为:https://github.com/XiangtaoZheng/CDRSSI。
关键词
Advancements in cross-domain remote sensing scene interpretation
Zheng Xiangtao1, Xiao Xinlin1, Chen Xiumei1, Lu Wanxuan2, Liu Xiaoyu2, Lu Xiaoqiang1(1.College of Physics and Information Engineering, Fuzhou University, Fuzhou 350108, China;2.Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China) Abstract
In remote sensing of Earth observation,multi-source data can be captured by multiple platforms,multiple sensors,and multiple perspectives. These data provide complementary information for interpreting remote sensing scenes. Although these data offer richer information,they also increase the demand for model depth and complexity. Deep learning plays a pivotal role in unlocking the potential of remote sensing data by delving deep into the semantic layers of scenes and extracting intricate features from images. Recent advancements in artificial intelligence have greatly enhanced this process. However,deep learning networks have limitations when applied to remote sensing images. 1)The huge number of parameters and the difficulty in training,as well as the over-reliance on labeled training data,can affect these images. Remote sensing images are characterized by“data miscellaneous marking difficulty”,which makes manual labeling insufficient for meeting the training needs of deep learning. 2)Variations in remote sensing platforms,sensors,shooting angles, resolution,time,location,and weather can all impact remote sensing images. Thus,the interpreted images and training samples cannot have the same distribution. This inconsistency results in weak generalization ability in existing models, especially when dealing with data from different distributions. To address this issue,cross-domain remote sensing scene interpretation aims to train a model on labeled remote sensing scene data(source domain)and apply it to new,unlabeled scene data(target domain)in an appropriate way. This approach reduces the dependence on target domain data and relaxes the assumption of the same distribution in existing deep learning tasks. The shallow layers of convolutional neural networks can be used as general-purpose feature extractors,but deeper layers are more task-specific and may introduce bias when applied to other tasks. Therefore,the migration model must be modified to accomplish the task of interpreting the target domain. Cross-domain interpretation tasks aim to establish a model that can adapt to various scene changes by utilizing migration learning,domain adaptation and other techniques for reducing model prediction inaccuracy caused by changes in the data domain. This approach improves the robustness and generalization ability of the model. Interpreting cross-domain remote sensing scenes typically requires using data from multiple remote sensing sources,including radar, aerial and satellite imagery. These images may have varying views,resolutions,wavelength bands,lighting conditions and noise levels. They may also originate from different locations or sensors. As the Global Earth Observation Systems continues to advance,remote sensing images now include cross-platform,cross-sensor,cross-resolution,and cross-region, which results in enormous distributional variances. Therefore,the study of cross-domain remote sensing scene interpretation is essential for the commercial use of remote sensing data and has theoretical and practical importance. This report categorizes scene decoding tasks into four main types based on the labeled set of data:methods based on closed-set domain adaptation,partial-domain adaptation,open-set domain adaptation and generalized domain adaptation. Approaches based on closed-set domain adaptation focus on tasks where the label set of the target domain is the same as that of the source domain. Partial domain adaptation focuses on tasks where the label set of the target domain is a subset of the source domain. Open-set domain adaptation aims to research tasks where the label set of the source domain is a subset of the label set of the target domain,and it does not apply restrictions in the approach of generalized domain adaptation. This study provides an in-depth investigation of two typical tasks in cross-domain remote sensing interpretation:scene recognition and target knowledge. The first part of the study utilizes domestic and international literature to provide a comprehensive assessment of the current research status of the four types of methods. Within the target recognition task,cross-domain tasks are further subdivided into cross-domain for visible light data and cross-domain from visible light to Synthetic Aperture Radar images. After a quantitative analysis of the sample distribution characteristics of different datasets,a unified experimental setup for cross-domain tasks is proposed. In the scene classification task,the dataset is explored by classifying it according to the label set categorization,and specific examples are given to provide the corresponding experimental setup for the readers’reference. The fourth part of the study discusses the research trends in cross-domain remote sensing interpretation, which highlights four challenging research directions:few-shot learning,source domain data selection,multi-source domain interpretation,and cross-modal interpretation. These areas will be important directions for the future development of remote sensing scene interpretation,which offers potential choices for readers’subsequent research directions.
Keywords
cross-domain remote sensing scene interpretation out-of-distribution generalization model generalization diverse dataset migration learning adaptive algorithm
|