用于单幅模糊图像超分辨的Transformer融合网络
刘花成1,2, 任文琦1, 王蕊3, 操晓春1(1.中山大学网络空间安全学院, 深圳 518107;2.中国科学技术大学软件学院, 苏州 215123;3.中国科学院信息工程研究所信息安全国家重点实验室, 北京 100093) 摘 要
目的 以卷积神经网络为代表的深度学习方法已经在单帧图像超分辨领域取得了丰硕成果,这些方法大多假设低分辨图像不存在模糊效应。然而,由于相机抖动、物体运动等原因,真实场景下的低分辨率图像通常会伴随着模糊现象。因此,为了解决模糊图像的超分辨问题,提出了一种新颖的Transformer融合网络。方法 首先使用去模糊模块和细节纹理特征提取模块分别提取清晰边缘轮廓特征和细节纹理特征。然后,通过多头自注意力机制计算特征图任一局部信息对于全局信息的响应,从而使Transformer融合模块对边缘特征和纹理特征进行全局语义级的特征融合。最后,通过一个高清图像重建模块将融合特征恢复成高分辨率图像。结果 实验在2个公开数据集上与最新的9种方法进行了比较,在GOPRO数据集上进行2倍、4倍、8倍超分辨重建,相比于性能第2的模型GFN(gated fusion network),峰值信噪比(peak signal-to-noive ratio,PSNR)分别提高了0.12 dB、0.18 dB、0.07 dB;在Kohler数据集上进行2倍、4倍、8倍超分辨重建,相比于性能第2的模型GFN,PSNR值分别提高了0.17 dB、0.28 dB、0.16 dB。同时也在GOPRO数据集上进行了对比实验以验证Transformer融合网络的有效性。对比实验结果表明,提出的网络明显提升了对模糊图像超分辨重建的效果。结论 本文所提出的用于模糊图像超分辨的Transformer融合网络,具有优异的长程依赖关系和全局信息捕捉能力,其通过多头自注意力层计算特征图任一局部信息在全局信息上的响应,实现了对去模糊特征和细节纹理特征在全局语义层次的深度融合,从而提升了对模糊图像进行超分辨重建的效果。
关键词
A super-resolution Transformer fusion network for single blurred image
Liu Huacheng1,2, Ren Wenqi1, Wang Rui3, Cao Xiaochun1(1.School of Cyber Science and Technology, Sun Yat-sen University, Shenzhen 518107, China;2.School of Software Engineering, University of Science and Technology of China, Suzhou 215123, China;3.State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China) Abstract
Objective Single image super-resolution is an essential task for vision applications to enhance the spatial resolution based image quality in the context of computer vision. Deep learning based methods are beneficial to single image super-resolution nowadays. Low-resolution images are regarded as clear images without blur effects. However, low-resolution images in real scenes are constrained of blur artifacts factors like camera shake and object motion. The degradation derived blur artifacts could be amplified in the super-resolution reconstruction process. Hence, our research focus on the single image super-resolution task to resolve motion blurred issue. Method Our Transformer fusion network (TFN) can be handle super-resolution reconstruction of low-resolution blurred images for super-resolution reconstruction of blurred images. Our TFN method implements a dual-branch strategy to remove some blurring regions based on super-resolution reconstruction of blurry images. First, we facilitate a deblurring module (DM) to extract deblurring features like clear edge structures. Specifically, we use the encoder-decoder architecture to design our DM module. For the encoder part of DM module, we use three convolutional layers to decrease the spatial resolution of feature maps and increase the channels of feature maps. For the decoder part of DM module, we use two de-convolutional layers to increase the spatial resolution of feature maps and decrease the channels of feature maps. In terms of the supervision of L1 deblurring loss function, the DM module is used to generate the clear feature maps in related to the down-sampling and up-sampling process of the DM module. But, our DM module tends to some detailed information loss of input images due to detailed information removal with the blur artifacts. Then, we designate additional texture feature extraction module (TFEM) to extract detailed texture features. The TFEM module is composed of six residual blocks, which can resolve some gradient explosion issues and speed up convergence. Apparently, the TFEM does not have down-sampling and up-sampling process like DM module, so TFEM can extract more detailed texture features than DM although this features has some blur artifacts. In order to take advantage of both clear deblurring features extracted by DM module and the detailed features extracted by TFEM module, we make use of a Transformer fusion module (TFM) to fuse them. We can use the clear deblurring features and detailed features in TFM module. We customize the multi-head attention layer to design the TFM module. Because the input of the transformer encoder part is one dimensional vector, we use flatten and unflatten operations in the TFM module. In addition, we can use the TFM module to fuse deblurring features extracted by the DM module and detailed texture features extracted by the TFEM module more effectively in the global sematic level based on long-range and global dependencies multi-head attention capturing ability. Finally, we use reconstruction module (RM) to carry out super-resolution reconstruction based on the fusion features obtained to generate a better super-resolved image. Result The extensive experiments demonstrate that our method generates sharper super-resolved images based on low-resolution blurred input images. We compare the proposed TFN to several algorithms, including the tailored single image super-resolution methods, the joint image deblurring and image super-resolution approaches, the combinations of image super-resolution algorithms and non-uniform deblurring algorithms. Specially, the single image super-resolution methods are based on the residual channel attention network(RCAN) and holistic attention network(HAN) algorithms, the image deblurring methods are melted scale-recurrent network(SRN) and deblur generative adversarial network(DBGAN) in, and the joint image deblurring and image super-resolution approaches are integrated the gated fusion network(GFN). To further evaluate the proposed TFN, we conduct experiments on two test data sets, including GOPRO test dataset and Kohler dataset. For GOPRO test dataset, the peak signal-to-noise ratio(PSNR) value of our TFN based super-resolved results by is 0.12 dB, 0.18 dB, and 0.07 dB higher than the very recent work of GFN for the 2×,4×and 8×scales, respectively. For Kohler dataset, the PSNR value of our TFN based super-resolved results is 0.17 dB, 0.28 dB, and 0.16 dB in the 2×, 4×and 8×scales, respectively. In addition, the PSNR value of model with DM model result is 1.04 dB higher than model with TFEM in ablation study. the PSNR value of model with DM and TFME module is 1.84 dB and 0.80 dB higher than model with TFEM, and model with DM respectively. The PSNR value of TFN model with TFEM, DM, and TFM, which is 2.28 dB,1.24 dB, and 0.44 dB higher than model with TFEM, model with DM, and model with TFEM/DM,respectively. To sum up, the GOPRO dataset based ablation experiments illustrates that the TFM promote global semantic hierarchical feature fusion in the context of deblurring features and detailed texture features, which greatly improves the effect of the network on the super-resolution reconstruction of low-resolution blurred images. The GOPRO test dataset and Kohler dataset based experimental results illustrates our network has a certain improvement of visual results qualitatively quantitatively. Conclusion We harnesses a Transformer fusion network for blurred image super-resolution. This network can super-resolve blurred image and remove blur artifacts, to fuse DB-module-extracted deblurring features by and TFEM-module-extracted texture features via a transformer fusion module. In the transformer fusion module, we uses the multi-head self-attention layer to calculate the response of local information of the feature map to global information, which can effectively fuse deblurring features and detailed texture features at the global semantic level and improves the effect of super-resolution reconstruction of blurred images. Extensive ablation experiments and comparative experiment demonstrate that our TFN demonstrations have its priority on the visual result quantity and quantitative ability.
Keywords
|