多尺度局部特征增强Transformer道路裂缝检测模型
摘 要
目的 道路裂缝是路面病害的早期征兆。定期监测路面状况、及时准确地发现路面裂缝对于交通养护机构降低成本、保证路面结构的可靠性和耐久性以及提高驾驶安全性、舒适性有重要意义。目前基于卷积神经网络的深度学习模型在长距离依赖建模方面存在不足,模型精度难以满足真实路面环境下的裂缝检测任务。一些模型通过引入空间/通道注意力机制进行长距离依赖建模,但是会导致计算量和计算复杂程度增加,无法实现实时检测。鉴于此,本文提出一种基于Transformer编码—解码结构的深度神经网络道路裂缝检测模型CTNet (crack transformernetwork)。方法 该模型主要由Transformer注意力模块、多尺度局部特征增强模块、上采样模块和跨越连接4部分构成。采用Transformer注意力机制能更有效提取全局和长距离依赖关系,克服传统卷积神经网络表征输入信息的短距离相关缺陷。同时,为适应裂缝尺寸变化多样性,将Transformer与多尺度局部特征增强模块相结合,从而有效整合不同尺度局部信息,克服Transformer局部特征建模不足。结果 通过与DeepCrack模型在不同裂缝检测数据集中的比较表明,本文提出的多尺度局部特征增强Transformer网络能快速、准确地分割路面裂缝,且效率更优。定量研究结果表明,CTNet在更有挑战性的CrackLS315数据集中的精度、召回率和F1值达到91.38%、80.38%和85.53%,明显优于对比方法。在CrackWH100数据集中,精度、召回率和F1值进一步提升,分别达到92.70%、90.52%和91.60%。此外,CTNet的训练速率提升至DeepCrack模型的6.78倍。结论 CTNet可以实现强噪声背景下的道路裂缝检测,检测效果优于目前最优方法,且参数量小,易于训练和部署。
关键词
Multi-scale local feature enhanced transformer network for pavement crack detection
Xu Zhengsen, Lei Xiangda, Guan Haiyan(School of Remote Sensing and Geomatics Engineering, Nanjing University of Information Science and Technology, Nanjing 210044, China) Abstract
Objective The pavement-relevant inspection is focused on the optimization for pavement cracks early-alarming detection and the preservation of pavement structure. However,conventional image processing-based techniques are laborintensive and time-consuming,such as edge detection,threshold segmentation,template matching,and morphology operations. It is challenged to the geometric and spectral complexities of pavement crack and its contexts(e. g. ,illumination variation,oil or water stains,and shadows caused by trees and vehicles). The convolution neural network(CNN)based deep learning image processing techniques have been developing intensively. However,the CNN-based methods are less effective in long-range dependency modeling,which may cause insufficient detection results in complicated road surface scenarios. Some works are related to attention mechanisms like spatial or channel attention modules,and self-attention modules. However,these attention mechanism-based operations are still challenged for their sophistication and computational cost. Method To detect pavement cracks efficiently and effectively,we develop a novel Transformer-based encoderdecoder neural network,called CTNet,which consists of Transformer blocks,multi-scale local feature enhanced blocks, upsampling blocks,and skip connections. The CTNet can achieve more long-range dependency and global receptive field in terms of multi-head self-attention-based Transformer mechanism. Although Transformer is featured by high running efficiency and low computational overhead demand,it is infeasible to model local contextual information because Token generation can break the connections of neighboring regions. Thus,to capture more multi-scale local information,we design a multi-scale local feature-enhanced block in terms of a multiple dilation ratios-relevant dilation convolution block. Especially,the designed multi-scale local feature enhancement block is melted into each Transformer block for local information complement. Both of local and global low-level contextual features can be captured for feature enhancement. Afterwards,a novel decoder path is implemented to extract high-level features. The decoder consists of the Transformer blocks similar to the up-sampling blocks and the spatial details can be restored for end-to-end segmentation. Result To demonstrate the efficiency and effectiveness of our proposed CTNet,a series of comparative analyses and ablation studies are carried out on three datasets. First,the CTNet can optimize running efficiency,as well as comparable computation overhead and complexity compared to the current UNet,SegNet,DeepCrack,and SwinUnet. Second,CTNet is 6. 78 times faster than the second-best DeepCrack model in terms of training speed. On CrackLS315 dataset,quantitative analyses are also showed that the optimal CTNet is obtained a precision of 91. 38%,a recall of 80. 38%,and a F1 measure of 85. 53% of each;on CrackWH100 dataset,CTNet can obtain a precision of 92. 70%,a recall of 90. 52%,and a F1 measure 91. 60% of each as well. However,it is still challenged to lack of local information when pure-Transformer-based Swin-UNet performed not well compared to fully convolution networks. Furthermore,the CTNet is insufficient to converge when the local blocksenhanced are removed. In summary,the Transformer-based CTNet is beneficial to multi-scenario pavement cracks in terms of the global receptive field. The CTNet can get pavement crack detection results consistently. Conclusion The proposed CTNet has its potentials to deal with noisy pavement images for pavement crack detection.
Keywords
road engineering pavement crack detection deep learning semantic segmentation self-attention Transformer
|