集成注意力机制和扩张卷积的道路提取模型
摘 要
目的 为解决当前遥感影像道路提取方法普遍存在的自动化程度低、提取精度不高和由于样本数量不平衡导致的模型训练不稳定等问题,本文提出一种集成注意力机制和扩张卷积的道路提取模型(attention and dilated convolutional U-Net,A&D-UNet)。方法 A&D-UNet聚合网络模型以经典U-Net网络结构为基础,在编码部分引入残差学习单元(residual learning unit,RLU),降低深度卷积神经网络在训练时的复杂度;应用卷积注意力模块(convolutional block attention module,CBAM)从通道和空间维度两个方面优化分配权重,突出道路特征信息;并使用扩张卷积单元(dilated convolutional unit,DCU)感受更大范围的特征区域,整合道路的上下文信息。采用二进制交叉熵(binary cross entropy,BCE)和Dice相结合的复合损失函数训练模型,减轻遥感影像中样本数量不平衡导致的模型不稳定。结果 在公开的美国马萨诸塞州和Deep Globe道路数据集上进行模型验证实验,并与传统的U-Net、LinkNet和D-LinkNet图像分割模型对比分析。在美国马萨诸塞州道路测试集上,本文构建的A&D-UNet模型的总体精度、F1分数和交并比等评价指标分别为95.27%、77.96%和79.89%,均优于对比算法,在测试集中对线性特征明显、标签遗漏标记以及存在树木遮挡的道路区域具有更好的识别效果;在Deep Globe道路测试集上,A&D-UNet模型的总体精度、F1分数和交并比分别为94.01%、77.06%和78.44%,且对线性特征明显的主干道路、标签未标记的狭窄道路以及阴影遮挡的城市道路都具有较好的提取效果。结论 本文提出的A&D-UNet道路提取模型,综合了残差学习、注意力机制和扩张卷积的优点,有效提升了目标分割的性能,是一种提取效果较好、值得推广的聚合网络模型。
关键词
Road extraction model derived from integrated attention mechanism and dilated convolution
Wang Yong1, Zeng Xiangqiang1,2(1.State Key Laboratory of Resources and Environmental Information System, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China;2.College of Resources and Environment, University of Chinese Academy of Sciences, Beijing 100049, China) Abstract
Objective Due to sample imbalance in the existing road extraction methods in remote sensing images, we facilitate the deep convolutional neural aggregation network model, integrated attention mechanism and dilation convolutional (A&D-UNet) to optimize the issues of low automation, less extraction accuracy, and unstable model training. Method To reduce the complexity of deep network model training, the A&D-UNet model uses residual learning unit (RLU) in the encoder part based on the classical U-Net network structure. To highlight road feature information, the convolutional block attention module (CBAM) is applied to assign weights optimally from channel and spatial dimensions both to accept a larger range of receptive filed, the following road features information is obtained by dilated convolutional unit (DCU). The A&D-UNet model takes full advantage of residual learning, dilated convolution, and attention mechanisms to simplify the training of the model, obtain more global information, and improve the utilization of shallow features, respectively. First, RLU, as a component of the backbone feature extraction network, takes advantage of identity mapping to avoid the problem of difficult training and degradation of the model caused by deep and continuous convolutional neural networks. Second, DCU makes full use of the road feature map after the fourth down-sampling of the model and integrates the contextual information of the road features through the consistent dilation convolution with different dilation rates. Finally, CBAM multiplies the attention to road features by the form of weighted assignment along the sequential channel dimension and spatial dimension, which improves the attention to shallow features, reduces the interference of background noise information. The binary cross-entropy (BCE) loss function is used to train the model in image segmentation tasks in common. However, it often makes the model fall into local minima when facing the challenge of the unbalanced number of road samples in remote sensing images. To improve the road segmentation performance of the model, BCE and Dice loss functions are combined to train the A&D-UNet model. To validate the effectiveness of the model, our experiments are conducted on the publicly available Massachusetts road dataset (MRDS) and deep globe road dataset. Due to the large number of blank areas in the MRDS and the constraints of computer computing resources, these remotely sensed images are cropped to a size of 256×256 pixels, and contained blank areas are removed. Through the above processing steps, 2 230 training images and 161 test images are generated. In order to compare the performance of this model in the roadway extraction task, we carry out synchronized road extraction experiments to visually analyze the results of road extraction via three network models, classical U-Net, LinkNet, and D-LinkNet. In addition, such five evaluation metrics like overall precision (OA), precision (P), recall (R), F1-score (F1), and intersection over union (IoU) are used for a comprehensive assessment to analyze the extraction effectiveness of the four models quantitatively. Result The following experimental results are obtained through the comparative result of road extraction maps and quantitative analysis of metrics evaluation:1) the model proposed in this work has better recognition performance in three cases of obvious road-line characteristics (ORLC), incomplete road label data (IRLD), and the road blocked by trees (RBBT). A&D-UNet model extracts road results that are similar to the ground truth of road label images with clear linear relationship of roads. It can learn the relevant features of roads through large training data sets of remote sensing images, avoiding the wrong extraction of roads in the case of IRLD. It can extract road information better by DCU and CBAM in the RBBT case, which improves the accuracy of model classification prediction. 2) The A&D-UNet network model is optimized compared algorithms in the evaluation metrics of OA, F1, and IoU, reaching 95.27%, 77.96% and 79.89% in the Massachusetts road testsets, respectively. To alleviate the degradation problem of the model caused by more convolutional layers to a certain extent, the A&D-UNet model uses RLU as the encoder in comparison with the classical U-Net network, and its OA, F1, and IoU are improved by 0.99%, 6.40%, and 4.08%, respectively. Meanwhile, the A&D-UNet model improves OA, F1, and IoU on the test set by 1.21%, 5.12%, and 3.93% over LinkNet through DCU and CBAM, respectively. 3) The F1 score and IoU of A&D-UNet model are trained and improved by 0.26% and 0.18% each via the compound loss function. This indicates that the loss function combined by BCE and Dice can handle the problem of imbalance between positive and negative samples, thus improving the accuracy of the model prediction classification. Through the above comparative analysis between different models and different loss functions, it is obvious that our A&D-UNet road extraction model has better extraction capability. 4) Judged from testing with the deep globe road dataset, we can obtain the OA, F1 score, and IoU of the A&D-UNet model(each of them is 94.01%, 77.06%, and 78.44%), which shows that the A&D-UNet model has a better extraction effect on main roads with obvious road-line characteristics, narrow road unmarked in label data/overshadowed roads. Conclusion Our A&D-UNet aggregation network model is demonstrated based on RLU with DCU and CBAM. It uses a combination of BCE and Dice loss functions and MRDS for training and shows better extraction results. The road extraction model is integrated to residual learning, attention mechanism and dilated convolution. This novel aggregation network model is featured with high automation, high extraction accuracy, and good extraction effect. Compared to current classical algorithms, it alleviates problems such as difficulties in model training caused by deep convolutional networks through RLU, uses DCU to integrate detailed information of road features, and enhances the degree of utilization of shallow information using CBAM. Additionally, the integrated BCE and Dice loss function optimize the issue of unbalanced sample of road regions and background regions.
Keywords
road information residual learning unit (RLU) convolutional block attention module (CBAM) dilated convolution unit (DCU) loss function
|