多尺度分解和八度卷积相结合的红外与可见光图像融合
摘 要
目的 在基于深度学习的红外与可见光图像融合方法中,多尺度分解是一种提取不同尺度特征的重要方式。针对传统多尺度分解方法里尺度设置粗糙的问题,提出了一种基于八度(octave)卷积的改进图像融合算法。方法 融合方法由4部分组成:编码器、特征增强、融合策略和解码器。首先,使用改进后的编码器获取源图像的多尺度上的低频、次低频和高频特征。这些特征会被从顶层到底层进行强化。其次,将这些特征按照对应的融合策略进行融合。最后,融合后的深度特征由本文设计的解码器重构为信息丰富的融合图像。结果 实验在TNO和RoadScene数据集上与9种图像融合算法进行比较。主观评价方面,所提算法可以充分保留源图像中的有效信息,融合结果也符合人的视觉感知;客观指标方面,在TNO数据集上所提算法在信息熵、标准差、视觉信息保真度、互信息和基于小波变换提取局部特征的特征互信息5个指标上均有最优表现,相较于9种对比方法中最优值分别提升了0.54%,4.14%,5.01%,0.55%,0.68%。在RoadScene数据集上所提算法在信息熵、标准差、视觉信息保真度和互信息4个指标上取得了最优值,相较9种对比方法的最优值分别提升了0.45%,6.13%,7.43%,0.45%,基于小波变换提取局部特征的特征互信息与最优值仅相差0.002 05。结论 所提融合方法在主观和客观评估中都取得了优秀的结果,可以有效完成图像融合任务。
关键词
Multi-scale decomposition and octave convolution based infrared and visible image fusion
Zhang Zihan1,2, Wu Xiaojun1,2, Xu Tianyang1,2(1. School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China;2. Abstract
Objective Image fusion can be as one of the processing techniques in the context of computer vision, which aims to integrate the salient features from multiple input images into a complicated image. In recent years, image fusion approaches have been involved in applications-relevant like video clips analysis and medical-related interpretation. Generally, the existing fusion algorithms consist of two categories of methods: 1) traditional-based and 2) deep learning-based. Most traditional methods have introduced the signal processing operators for image fusion and the completed fusion task. However, the feature extraction and fusion rules are constrained of human-labeled methods. The feature extraction and fusion rule are quite complicated for realizing better fusion results. Thanks to the rapid development of deep learning, current image fusion methods have been facilitated based on this technique. Multi-scale decomposition can be as an effective method to extract the features for deep learning-based infrared image and visible image fusion. To alleviate the rough scale settings in the traditional multi-scale decomposition methods, we develop an improved octave convolution-based image fusion algorithm. Deep features can be divided in terms of octave convolution-based frequency. Method Our fusion method is composed of four aspects as following: 1) encoder, 2) feature enhancement, 3) fusion strategy and 4) decoder. The encoder extracts deep features on four scales source image-derived through convolution and pooling. The deep features-extracted of each scale are subdivided into low-frequency, sub low-frequency and high-frequency features in terms of octave convolution. For enhancement phase, high-level features are added to low-level features for feature enhancement between different scales. High-level high-frequency features are utilized to enhance low-level sub low-frequency features, and high-level sub low-frequency features are utilized to enhance low-level low-frequency features. The low-frequency, sub low-frequency and high-frequency features of each scale are fused based on multiple fusion strategies. To produce the informative fused image, the features-fused are reconstructed via the designed decoder. In our experiment, all requirements are based on the Ubuntu system with NVIDIA GTX 1080Ti GPU. The Python version is 3.6.10 and the PyTorch is used for implementation. For training phase, the network does not use the fusion strategy. The pairs of infrared and visual images are not required for network training because it just needs deep features extraction and image reconstruction with these deep features. We choose 80 000 images from the dataset MS COCO(Microsoft common objects in context) as the training set of our auto-encoder network, which is converted to grayscale and then resized to 256×256 pixels. Adam optimizer is utilized to optimize our model. The learning rate, batch size and epochs are set as 1×10-4, 1 and 2 of each. After the training, the network can complete the image fusion task. First, the improved encoder is used to obtain the low-frequency, sub low-frequency and high-frequency features of the source image in multiple scales. These features can be enhanced between top and bottom levels. Second, these features are fused in terms of multiple fusion strategy. Finally, to obtain the informative fused image, the features-fused are reconstructed in terms of the designed decoder. Result The proposed fusion algorithm is compared to 9 sorts of existing image fusion algorithms on TNO and RoadScene datasets, and all image fusion algorithms are evaluated qualitatively and quantitatively. This algorithm can fully keep the effective natural-relevant information between the source image and the fused results. It is still challenged to evaluate some algorithms quantitatively, so we choose the 6 objective metrics to evaluate the fusion performance of these methods. Compared with other algorithms on TNO dataset, the proposed algorithm achieves the best performance in five indicators: 1) entropy, 2) standard deviation, 3) visual information fidelity, 4) mutual information and 5) wavelet transform-based feature mutual information. Compared with the best values in the above five metrics of nine existing fusion algorithms, an average increase of are outreached 0.54%, 4.14%, 5.01%, 0.55%, 0.68% of each further. The performance of our algorithm-developed on RoadScene dataset is consistent with that on TNO dataset basically. The best values are obtained in 4 kinds of quality metrics: a) entropy, b) standard deviation, c) visual information fidelity, and d) mutual information. Compared to the 9 sort of existing methods, the best values of the four metrics are increased by 0.45%, 6.13%, 7.43%, and 0.45%, respectively. The gap between the value of our algorithm and the best value is only 0.002 05 in wavelet transform-based feature mutual information. Conclusion A novel and effective deep learning architecture is developed for infrared and visible image fusion analysis based on convolutional neural network and octave convolution. This network structure can make full use of multi-scale deep features. The octave convolution makes a more detailed division of the extracted features and the appropriated fusion strategies can be selected for these deep features further. Because low-frequency, sub low-frequency and high-frequency features are divided in each scale, more appropriated features can be selected to enhance low-level features in the feature enhancement phase. The experimental results show that our algorithm has its potentials in image fusion according to qualitative and quantitative evaluation.
Keywords
|