TransAS-UNet:融合Swin Transformer和UNet的乳腺癌区域分割
徐旺旺1,2, 许良凤1,2, 李博凯1,2, 周曦1,2,3, 律娜4, 詹曙1,2(1.合肥综合性国家科学中心人工智能研究院, 合肥 230601;2.合肥工业大学计算机与信息学院, 合肥 230601;3.安徽水利电力职业技术学院, 合肥 231603;4.安徽医科大学第一附属医院, 合肥 230022) 摘 要
目的 乳腺癌在女性中是致病严重且发病率较高的疾病,早期乳腺癌症检测是全世界需要解决的重要难题。如今乳腺癌的诊断方法有临床检查、影像学检查和组织病理学检查。在影像学检查中常用的方式是X光、CT (computed tomography)、磁共振等,其中乳房X光片已用于检测早期癌症,然而从本地乳房X线照片中手动分割肿块是一项非常耗时且容易出错的任务。因此,需要一个集成的计算机辅助诊断(computer aided diagnosis,CAD)系统来帮助放射科医生进行自动和精确的乳房肿块识别。方法 基于深度学习图像分割框架,对比了不同图像分割模型,同时在UNet结构上采用了Swin架构来代替分割任务中的下采样和上采样过程,实现局部和全局特征的交互。利用Transformer来获取更多的全局信息和不同层次特征来取代短连接,实现多尺度特征融合,从而精准分割。在分割模型阶段也采用了Multi-Attention ResNet分类网络对癌症区域的等级识别,更好地对乳腺癌进行诊断医疗。结果 本文模型在乳腺癌X光数据集INbreast上实现肿块的准确分割,IoU (intersection over union)值达到95.58%,Dice系数为93.45%,与其他的分割模型相比提高了4%~6%,将得到的二值化分割图像进行四分类,Accuracy值达到95.24%。结论 本文提出的TransAS-UNet图像分割方法具有良好的性能和临床意义,该方法优于对比的二维图像医学分割方法。
关键词
TransAS-UNet:regional segmentation of breast cancer Swin Transformer and of UNet algorithm
Xu Wangwang1,2, Xu Liangfeng1,2, Li Bokai1,2, Zhou Xi1,2,3, Lyu Na4, Zhan Shu1,2(1.Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei 230601, China;2.School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601, China;3.Anhui Water Conservancy and Electric Power Technical College, Hefei 231603, China;4.The First Affiliated Hospital of Anhui Medical University, Hefei 230022, China) Abstract
Objective Breast cancer is a serious and high-morbidity disease in women. Early detection of breast cancer is an important problem that needs to be solved all over the world. The current diagnostic methods for breast cancer include clinical, imaging, and histopathological examinations. The commonly used methods in imaging examination are X-ray, computed tomography (CT), and magnetic resonance imaging. etc., among which mammograms have been used in early cancer to detect;however, manually segmenting the mass from the local mammogram is an very time-consuming and errorprone task. Therefore, an integrated computer aided diagnosis (CAD) is needed to help radiologists perform automatic and precise breast mass identification. Method In this work, we compared different image segmentation models based on the deep learning image segmentation framework. At the same time, on the based UNet structure, we adopt the Swin architecture to replace the downsampling and upsampling processes in the segmentation task, to realize the interaction between local and global features. At the same time we use a Transformer to obtain more global information and different hierarchical features to replace short connections and realize multi-scale feature fusion to achieve accurate segmentation. In the segmentation model stage, we also use so as Multi-Attention ResNet classification network to identify the classification of cancer regions Better diagnosis and treatment of breast cancer. During segmentation the Swin Transformer and atrous spatial pyramid pooling(ASPP) modules are used to replace the common convolution layer through analogy with the UNet structure model. The shift window and multiple attention are used to achieve the integration of feature information inside the image slice and extract information complementarity between non-adjacent areas. At the same time, the ASPP structure can achieve self-attention of local information with an increasing receptive field. A Transformer structure is introduced to correlate information between different layers to prevent the loss of shallow layers of important information during downsampling convolution. The final architecture not only inherits advantages Transformer's in learning global semantic associations, but also uses different levels of characteristics to preserve more semantics and more details in the model. As the input dataset of classification networks, binarized images obtained by the segmentation model can be used to identify different categories of breast cancer tumors. Based on ResNet50, this classification model adds multi-type attention modules and overfitting operations. squeeze-and-excitation(SE) and selective kernel(SK) attention can optimize network parameters, so that it only pays attention to the differences in segmentation regions improving the efficiency of the model. Thus proposed model by us achieved accurate segmentation of the lump on the breast cancer X-ray dataset INbreast, and we also compared it with five segmentation structures:UNet, UNet++, Res18_UNet, MultiRes_UNet, and Dense_UNet. After the segmentation model, a more accurate binary map of the cancer region was obtained. Problems, such as feature information blending of different levels and self-concern of the local information of the convolutional layer, exist in up-sampling and downsampling based on the UNet structure. Therefore, the Swin Transformer structure, which has a sliding window operation and hierarchical design, is adopted. Window Attention is shifted mainly by the Window Attention module and the Shifted window attention module, which enables the input feature graph to be sliced into multiple windows. The weight of each window is shifted in accordance with the shifted self-attention, and the position of the entire feature graph is shifted. It can realize the information interaction within the same feature graph. In upsampling and downsampling, we use four Swin Transformer structures. and in the process of fusion, we use the pyramid ASPP structure to replace the common feature graph channel addition operation, which can use multiple convolution check feature graphs and channel fusions, and the given input can be sampled in parallel with cavity convolution at different sampling rates. Achieve multiple scale capture image context information is obtained. In order to better integrate high- and low-dimensional spatial information, we propose a new multiscale feature graph fusion strategy and use a Transformer with skip connections to enhance spatial domain information representation. Each cancer image was classified into normal, mass, deformation, and calcification according the introduction of the INbreast dataset. Each category was labeled and then sent to the classification network. The classification model we adopted takes ResNet50 as the baseline model. On this basis, two different kinds of attention, i. e., SE and SK, are added. SK convolution replaces 3 × 3 convolution at every bottleneck. Thus, more image features can be extracted at the convolutional layer. Meanwhile SE belongs to channel attention, and each channel can be weighted before the pixel value is outputted. Three methods, namely, Gaussian error gradient descent, label smoothing, and partial data enhancement, are introduced to improve the accuracy of the model. Result In the same parameter environment, the intersection over union (IoU) value reached 95. 58%. Dice coefficient was 93. 45%, which was 4%-6% higher than that of the other segmentation models. The binary segmentation image is classified into four categories, and the Accuracy reached 95. 24%. Conclusion Experiments show that our proposed TransAS-UNet image segmentation method demonstrates good performance and clinical significance which is superior to those of other 2D image medical segmentation methods.
Keywords
|