Current Issue Cover
引入分组注意力的医学图像分割模型

张学峰, 张胜, 张冬晖, 刘瑞(南昌航空大学信息工程学院, 南昌 330063)

摘 要
目的 卷积神经网络结合U-Net架构的深度学习方法广泛应用于各种医学图像处理中,取得了良好的效果,特别是在局部特征提取上表现出色,但由于卷积操作本身固有的局部性,导致其在全局信息获取上表现不佳。而基于Transformer的方法具有较好的全局建模能力,但在局部特征提取方面不如卷积神经网络。为充分融合两种方法各自的优点,提出一种基于分组注意力的医学图像分割模型(medical image segmentation module based on group attention,GAU-Net)。方法 利用注意力机制,设计了一个同时集成了Swin Transformer和卷积神经网络的分组注意力模块,并嵌入网络编码器中,使网络能够高效地对图像的全局和局部重要特征进行提取和融合;在注意力计算方式上,通过特征分组的方式,在同一尺度特征内,同时进行不同的注意力计算,进一步提高网络提取语义信息的多样性;将提取的特征通过上采样恢复到原图尺寸,进行像素分类,得到最终的分割结果。结果 在Synapse多器官分割数据集和ACDC (automated cardiac diagnosis challenge)数据集上进行了相关实验验证。在Synapse数据集中,Dice值为82.93%,HD(Hausdorff distance)值为12.32%,相较于排名第2的方法,Dice值提高了0.97%,HD值降低了5.88%;在ACDC数据集中,Dice值为91.34%,相较于排名第2的方法提高了0.48%。结论 本文提出的医学图像分割模型有效地融合了Transformer和卷积神经网络各自的优势,提高了医学图像分割结果的精确度。
关键词
Group attention-based medical image segmentation model

Zhang Xuefeng, Zhang Sheng, Zhang Donghui, Liu Rui(School of Information Engineering, Nanchang Hangkong University, Nanchang 330063, China)

Abstract
Objective The end-to-end automatic medical image segmentation model has been concerned about recently. The emerging deep learning method has been widely used in various medical image processing tasks based on an integrated convolutional neural network(CNN) and U-Net architecture,especially for its potential ability of local feature extraction. Due to the inherent locality of the convolution operation itself,it is still challenged for global information acquisition further. The Transformer-based method is focused on global modeling capabilities,but it is still required to optimize CNNsbased local feature extraction farther. To fully integrate the potentials of two methods,we develop a group attention based medical image segmentation model,called GAU-Net. Method First,to integrate the potentials of the convolutional neural network and the Swin Transformer,a dual of group attention module is designed that the Swin Transformer is linked to the convolutional neural network in parallel using the attention mechanism. To extract the global features of the image,a series of Swin Transformer modules are recognized as the sub-modules. The spatial and pixel channel attention modules are constructed using the convolutional neural network,and two of them are combined in series to develop the mixed attention in the group attention module. The sub-module can be used to extract key local features in the medical image on the spatial scale and pixel channel dimension,and two sub-modules-extracted features is spliced in the channel dimension,a residual unit is employed for feature fusion,and attention module-extracted key global and local features are grouped and fused, and the constructed group attention module is embedded in each layer of network encoder. Second,the attention calculation method is required to be focused on because of existing computational redundancy and efficient matching with the group attention module structure. To get simultaneous different attention calculations,encoder-extracted features are grouped proportionally in the feature channel dimension before it is input into the group attention module,and it can reduce the computational redundancy problem effectively and the diversity and richness of the network model-extracted semantic feature information are improved further. Finally,the extracted deep features are restored by layer-by-layer 2-fold upsampling to the original image size,and pixel classification is adopted to get the final segmentation result. At the same time, the class imbalance problem in the image is involved in,and the model training process is easily affected by irrelevant background pixels,and the linear combination of generalized dice loss and cross-entropy loss is used to solve the class imbalance problem and accelerate model convergence. Result Such of experimental verifications are carried out on the Synapse dataset and the ACDC dataset. The Synapse dataset consists of 30 cases with a total of 3 779 axial abdominal clinical computed tomography(CT) images. The data of 18 patient samples are used as the training set,and 12 patient samples are used as a test set. This dataset is labeled for 8 sort of abdominal organs in related to the aorta,gallbladder,spleen,left kidney,right kidney,liver,pancreas,and stomach. The ACDC dataset is collected from different patients using a magnetic resonance imaging (MRI) scanner. For each patient's image,the left ventricle,right ventricle and myocardium are labeled as well. This dataset is composed of 70 training samples,10 validation samples and 20 a test sample. Dice similarity coefficient and Hausdorff Distance95 are opted as the evaluation index to evaluate the accuracy of model segmentation results. Furthermore,ablation experiments are carried out to test the effectiveness of all modules and combinations. For the Synapse dataset,compared to the second-ranked method MISSFormer,the Dice value is increased by 0. 97%,and the Hausdorff distance(HD) value is decreased by 5. 88% and reached 82. 93%(Dice) and 12. 32%(HD),respectively. For the ACDC dataset,compared to the second-ranked method MISSFormer,the Dice value is increased by 0. 48% and reached to 91. 34%(Dice). Conclusion Our medical image segmentation model proposed can be used to develop an integrated optimization for Swin Transformer and convolutional neural network effectively. The group attention module and group attention operation mode are melted into as well,which can improves the accuracy of medical image segmentation results further.
Keywords

订阅号|日报