Current Issue Cover
改进U-Net3+与跨模态注意力块的医学图像融合

王丽芳, 米嘉, 秦品乐, 蔺素珍, 高媛, 刘阳(中北大学大数据学院山西省生物医学成像与影像大数据重点实验室,太原 030051)

摘 要
目的 针对目前多模态医学图像融合方法深层特征提取能力不足,部分模态特征被忽略的问题,提出了基于U-Net3+与跨模态注意力块的双鉴别器生成对抗网络医学图像融合算法(U-Net3+ and cross-modal attention block dual-discriminator generative adversal network,UC-DDGAN)。方法 结合U-Net3+可用很少的参数提取深层特征、跨模态注意力块可提取两模态特征的特点,构建UC-DDGAN网络框架。UC-DDGAN包含一个生成器和两个鉴别器,生成器包括特征提取和特征融合。特征提取部分将跨模态注意力块嵌入到U-Net3+下采样提取图像深层特征的路径上,提取跨模态特征与提取深层特征交替进行,得到各层复合特征图,将其进行通道叠加、降维后上采样,输出包含两模态全尺度深层特征的特征图。特征融合部分通过将特征图在通道上进行拼接得到融合图像。双鉴别器分别对不同分布的源图像进行针对性鉴别。损失函数引入梯度损失,将其与像素损失加权优化生成器。结果 将UC-DDGAN与5种经典的图像融合方法在美国哈佛医学院公开的脑部疾病图像数据集上进行实验对比,其融合图像在空间频率(spatial frequency,SF)、结构相似性(structural similarity,SSIM)、边缘信息传递因子(degree of edge information,QAB/F)、相关系数(correlation coefficient,CC)和差异相关性(the sum of the correlations of differences,SCD)等指标上均有提高,SF较DDcGAN(dual discriminator generation adversative network)提高了5.87%,SSIM较FusionGAN(fusion generative adversarial network)提高了8%,QAB/F较FusionGAN提高了12.66%,CC较DDcGAN提高了14.47%, SCD较DDcGAN提高了14.48%。结论 UC-DDGAN生成的融合图像具有丰富深层特征和两模态关键特征,其主观视觉效果和客观评价指标均优于对比方法,为临床诊断提供了帮助。
关键词
Medical image fusion using improved U-Net3+ and cross-modal attention blocks

Wang Lifang, Mi Jia, Qin Pinle, Lin Suzhen, Gao Yuan, Liu Yang(Shanxi Provincial Key Laboratory of Biomedical Imaging and Imaging Big Data, College of Big Data, North University of China, Taiyuan 030051, China)

Abstract
Objective Multi-modal medical image fusion tends to get more detailed features beyond single modal defection. The deep features of lesions are essential for clinical diagnosis. However, current multi-modal medical image fusion methods are challenged to capture the deep features. The integrity of fusion image is affected when extracting features from a single modal only. In recent years, deep learning technique is developed in image processing, and generative adversarial network (GAN), as an important branch of deep learning, has been widely used in image fusion. GAN not only reduces information loss but also highlights key features through information confrontation between different original images. The deep feature extraction ability of current multi-modal medical image fusion methods is insufficient and some modal features are ignored. We develop a medical image fusion method based on the improved U-Net3+ and cross-modal attention blocks in combination with dual discriminator generation adversative network (UC-DDGAN). Method The UC-DDGAN image fusion modal is mainly composed of full scale connected U-Net3+ network structure and two modal features integrated cross-modal attention blocks. The U-Net3+ network can extract deep features, and the cross-modal attention blocks can extract different modal features in terms of the correlation between images. Computed tomography (CT) and magnetic resonance (MR) can be fused through the trained UC-DDGAN, which has a generator and two discriminators. The generator is used to extract the deep features of image and generate fusion image. The generator includes two parts of feature extraction and feature fusion. In the feature extraction part, the encoding and decoding of coordinated U-Net3+ network complete feature extraction. In the coding stage, the input image is down-sampled four times to extract features, and cross-modal attention blocks are added after each down-sampling to obtain two modal composite feature maps. Cross-modal attention block not only calculates self-attention in a single image, but also extends the calculation of attention to two modes. By calculating the relationship between local features and global features of the two modes, the fusion image preserves the overall of image information. In the decoding stage, the decoder receives the feature maps in the context of the same scale encoder and the maximum pooling based smaller scale encoder and the dual up-sampling based large scale encoder. Then, 64 filters with a size of 3×3 are linked to the feature image channels. The synthesized feature maps of each layer are combined and up-sampled. After 1×1 convolution for channel dimension reduction, the feature maps are fused into the image which contains depth features on the full scale of the two modes. In the feature fusion part, to obtain the fusion image with deep details and the key features of the two modes, the two feature maps are synthesized and concatenated via the concat layer, and five convolution modules for channel dimension reduction layer by layer. The discriminator is focused on leveraging original image from fusion image via the distribution of different samples. To identify the credibility of the input images, the characteristics of different modal images are integrated with different distribution. In addition, gradient loss is melted into the loss function calculation, and the weighted sum of gradient loss and pixel loss are as the loss function to optimize the generator. Result To validate the quality of fusion image, UC-DDGAN is compared to five popular multi-modal image fusion methods, including Laplasian pyramid(LAP), pulse-coupled neural network(PCNN), convolutional neural network(CNN), fusion generative adversarial network(FusionGAN) and dual discriminator generative adversarial network(DDcGAN). The edges of fusion results obtained by LAP are fuzzy in qualitative, which are challenged to observe the contour of the lesion. The brightness of fusion results obtained by PCNN is too low. The CNN-based fusion results are lack of deep details, and the internal details cannot be observed. The fusion results obtained by using FusionGAN pay too much attention to MR modal images and lose the bone information of CT images. The edges of fusion results obtained by DDcGAN are not smooth enough. 1)The fusion results of cerebral infarction disease obtained by UC-DDGAN can show clear brain gullies, 2)the fusion results of cerebral apoplexy disease can clarify color features, 3)the fusion results of cerebral tumor disease show brain medulla and bone information are fully reserved, and 4)the fusion results of cerebrovascular disease contain deep-based information of brain lobes. To evaluate the performance of UC-DDGAN, quantitative results are based on the selected thirty typical image pairs and five classical methods. The fusion image generated by UC-DDGAN is improved on spatial frequency (SF), structural similarity (SSIM), edge information transfer factor (QAB/F), correlation coefficient (CC), and the sum of the correlations of differences (SCD). 1)SF is improved by 5.87% in contrastive to DDcGAN, 2)SSIM is improved by 8% compared to FusionGAN, 3)QAB/F is improved by 12.66%, CC is improved by 14.47% and 4)SCD is improved by 14.48% in comparison with DDcGAN, respectively. Conclusion A dual discriminator generation adversative network based (UC-DDGAN-based) medical image fusion method is developed based on the improved U-Net3+ and cross-modal attention blocks. The fusion image generated by UC-DDGAN is linked to richer deep features and key features of two modes.
Keywords

订阅号|日报