U-Net通道变换网络在腺体图像分割中的应用
摘 要
目的 腺体医学图像分割是将医学图像中的腺体区域与周围组织分离出来的过程,对分割精度有极高要求。传统模型在对腺体医学图像分割时,因腺体形态多样性和小目标众多的特点,容易出现分割不精细或误分割等问题,对此根据腺体医学图像的特点对U-Net型通道变换网络分割模型进行改进,实现对腺体图像更高精度分割。方法 首先在U-Net型通道变换网络的编码器前端加入ASPP_SE (spatial pyramid pooling_squeeze-and-excitation networks)模块与ConvBatchNorm模块的组合,在增强编码器提取小目标特征信息能力的同时,防止模型训练出现过拟合现象。其次在编码器与跳跃连接中嵌入简化后的密集连接,增强编码器相邻模块特征信息融合。最后在通道融合变换器(channel cross fusion with Transformer,CCT)中加入细化器,将自注意力图投射到更高维度,提高自注意机制能力,增强编码器全局模块特征信息融合。简化后的密集连接与CCT结合使用,模型可以达到更好效果。结果 改进算法在公开腺体数据集MoNuSeg (multi-organ nuclei segmentation challenge)和Glas (gland segmentation)上进行实验。以Dice系数和IoU (intersection over union)系数为主要指标,在MoNuSeg的结果为80.55%和67.32%,在Glas数据集的结果为92.23%和86.39%,比原U-Net型通道变换网络分别提升了0.88%、1.06%和1.53%、2.43%。结论 本文提出的改进算法在腺体医学分割上优于其他现有分割算法,能满足临床医学腺体图像分割要求。
关键词
Application of U-Net channel transformation network in gland image segmentation
Cao Weijie, Duan Xianhua, Xu Zhenwei, Sheng Shuai(School of Computer Science, Jiangsu University of Science and Technology, Zhenjiang 212100, China) Abstract
Objective Adenocarcinoma is a malignant tumor originating from the glandular epithelium and poses immense harm to human health. With the rapid development of computer vision technology, medical imaging has become an important means for expert preoperative diagnosis. In the diagnosis of adenocarcinoma, doctors judge the severity of the cancer and grade it by analyzing the size, shape, and other external features of the glandular structure. Accordingly, achieving high-precision segmentation of glandular images has become an urgent requirement in clinical medicine. Glandular medical image segmentation refers to the process of separating the glandular region from the surrounding tissue in medical images, requiring high segmentation accuracy. Traditional models for segmenting glandular medical images can suffer from such problems as imprecise segmentation and mis-segmentation owing to the diverse shapes of glands and presence of numerous small targets. To address this issue, this study proposes an improved glandular medical image segmentation algorithm based on UCTransNet. UCTransNet addresses solves the semantic gap between different resolution modules of the encoder and between the encoder and decoder, thereby achieving high precision image segmentation. Method First, a combination of the fusion of ASPP_SE and ConvBatchNorm modules is added to the front end of the encoder. The ASPP_SE module combines the ASPP module and channel attention mechanism. The ASPP module consists of three different dilation rates of atrous convolution, a 1 × 1 convolution, and an ASPP pooling. Atrous convolution injects holes into standard convolution to expand the receptive field, obtain dense data features, and maintain the same output feature map size. The ASPP module uses multi-scale atrous convolution to obtain a large receptive field, and fuses the obtained features with the global features obtained from the ASPP pooling to obtain denser semantic information than the original features. The channel attention mechanism enables the model to focus considerably on important channel regions in the image, dynamically select information in the image, and give substantial weight to channels containing important information. In the CCT (channel cross fusion with Transformer), modules with higher weight of important information will achieve better fusion. The ConvBatchNorm module enhances the ability of the encoder to extract the features of small targets, while preventing overfitting during model training. Second, a simplified dense connection is embedded between the encoder and the skip connections, and the CCT in the model performs global feature fusion of the features extracted by the encoder from a channel perspective. Although the global attention ability of the CCT is strong, its problem is a weak local attention ability, and the ambiguity between adjacent encoder modules has not been solved. To solve this problem, a dense connection is added to enhance the local information fusion ability. The dense connection passes the upper encoder module through convolution pooling to obtain the lower encoder module and performs upsampling on the lower encoder to make its resolution consistent with the upper encoder module. The two encoder modules are concatenated on the channel, and the resolution does not change after concatenation. After concatenation, the upper encoder module obtains the feature information supplement of the lower encoder module. Consequently, the semantic fusion between adjacent modules is enhanced, the semantic gap between adjacent encoder modules is reduced, and the feature information fusion between adjacent encoder modules is improved. A refiner is added to the CCT, which projects the self-attention map to a higher dimension, and uses the head convolution to enhance the spatial context and local patterns of the attention map. This method effectively combines the advantages of self-attention and convolution to further improve the self-attention mechanism. Lastly, a linear projection is used to restore the attention map to the initial resolution, thereby enhancing the global feature information fusion of the encoder. A fusion ASPP_SE and ConvBatchNorm modules are added to the front end of the UCTransNet encoder to enhance its ability to extract small target features and prevent overfitting. Second, a simplified dense connection is embedded between the encoder and skip connection to enhance the fusion of adjacent module features. Lastly, a refinement module is added to the CCT to project the self-attention map to a markedly high dimension, thereby enhancing the global feature fusion ability of the encoder. The combination of the simplified dense connection and CCT refinement module improves the performance of the model. Result The improved algorithm was tested on the publicly available gland data sets MoNuSeg and Glas. The Dice and intersection over union (IoU) coefficients were the main evaluation metrics used. The Dice coefficient is a similarity measure used to represent the similarity between two samples. By contrast, the IoU coefficient is a standard used to measure the accuracy of the result's positional information. Both metrics are commonly used in medical image segmentation. The test results on the MoNuSeg data set were 80. 55% and 67. 32%, while those on the Glas data set were 92. 23% and 86. 39%. These results represent improvements of 0. 88% and 1. 06%, and 1. 53% and 2. 43%, respectively, compared those of the original UCTransNet. The improved model was compared to existing popular segmentation networks and was found to generally outperform them. Conclusion The proposed improved model is superior to existing segmentation algorithms in medical gland segmentation and can meet the requirements of clinical medical gland image segmentation. The CCT module in the original model was further optimized to fuse global and local feature information, thereby achieving better results.
Keywords
medical image segmentation U-Net from a channel-wise perspective with Transformer(UCTransNet) dense connection self-attention mechanism refinement module
|