高斯-维纳表示下的稠密焦栈图生成方法
王其腾, 李志龙, 丁新, 刘琼, 杨铀(华中科技大学) 摘 要
目的 焦栈图像能够扩展光学系统的景深,并为计算摄影、交互式和沉浸式媒体提供灵活的图像表达。然而,受限于光学系统的物理属性和拍摄对象的动态变化,人们往往只能拍摄稀疏的焦栈图像。因此,焦栈图像的稠密化成为当前需要解决的一个难题。为解决上述挑战,提出了一种高斯-维纳表示下的稠密焦栈图生成方法。方法 焦栈图像被抽象为高斯-维纳表示,所提出的双向预测模型分别包含双向拟合模块和预测生成模块,在高斯-维纳表示模型的基础上构建双向拟合模型,求解双向预测参数并绘制新的焦栈图像。首先,将稀疏焦栈序列的图像按照相同块大小进行分块,并基于此将相邻焦距、相同位置的块组合成块对,以块对为最小单元进行双向预测。其次,在双向预测模块中,块对将用于拟合出最佳双向拟合参数,并基于此求解出预测生成参数,生成新的焦栈图像块。最后,将所有预测生成得到的块进行拼接,得到新的焦栈图像。结果 在11组稀疏焦栈图像序列上进行实验,所采用评价指标包括峰值信噪比(Peak Signal to Noise Ratio,PSNR)和结构相似性(Structure Similarity Index Measure,SSIM)。11个序列生成结果的平均PSNR为40.861dB,平均SSIM为0.976。相比于广义高斯和空间坐标两个对比方法,PSNR分别提升了6.503dBdB和6.467dB,SSIM分别提升了0.057和0.092。各序列均值PSNR和SSIM最少提升了3.474dB和0.012。 结论 实验结果表明,所提出的双向预测方法可以较好地生成新的焦栈图像,能够在多种以景深为导向的视觉应用中发挥关键作用。
关键词
Gaussian-Wiener based Dense Focal Stack Images Synthesis
Wang Qiteng, Li Zhilong, Ding Xin, Liu Qiong, Yang You(Huazhong University of Science and Technology) Abstract
Objective In optical imaging systems, the depth of field (DoF) is typically limited by the properties of optical lense, resulting in the capability to focus only on a limited region of the scene. Thus, expanding the depth of field of optical systems is a challenge task in the community for both academia and industries. For example, in computational photography, by capturing dense focus stack images, photographers can choose different focal points and depth of field in post processing to achieve the desired artistic effects. In macro and micro imaging, dense focus stack images can provide clearer and more detailed images for more accurate analysis and measurement. For interactive and immersive media, dense focus stack images can provide a more realistic and immersive visual experience. However, achieving dense focus stack images also faces some challenges. Firstly, the performance of hardware devices limits the speed and quality of image acquisition. During the shooting process, the camera needs to quickly and accurately adjust the focus and capture multiple images to build a focus stack. This requires high-performance cameras and adaptive autofocus algorithms. In addition, changes in the shooting environment, such as object motion or manual operations by photographers, can also introduce image blurring and alignment issues. To overcome these challenges, the block based Gaussian-Wiener bidirectional prediction model provides an effective solution. By dividing the image into blocks and utilizing the characteristics of local blocks for prediction, computational complexity can be reduced and prediction accuracy can be improved. Gaussian-Wiener filtering can smooth prediction results, reduce the impact of artifacts and noise, which can improve image quality. The bidirectional prediction method combines the original sparse FoSIs with the prediction results to generate dense FoSIs, thereby expanding the DoF of the optical system. The Gaussian-Wiener bidirectional prediction model provides an innovative method for capturing dense focus stack images. It can be applied to various scenarios and application fields, providing greater creative freedom and image processing capabilities for photographers, scientists, engineers, and artists. Method This work abstracts the FoSis as a Gaussian-Wiener representation. The proposed bidirectional prediction model includes a bidirectional fitting module and a prediction generation module respectively. Based on the Gaussian-Wiener representation model, a bidirectional fitting model is constructed to solve for the bidirectional prediction parameters and draw a new focal stack image. Firstly, based on the given sparse focus stack image sequence, number from near to far according to the focal length. These numbers start from 0 and are incremented according to a certain rule, such as increasing by 2 each time to ensure that all numbers are even. This results in a set of sparse focus stack images arranged in serial order. Next, we will block all images according to predefined block sizes. The size of each block can be selected based on specific needs and algorithms. Combine blocks located in adjacent numbers to form a block pair, which becomes the most basic unit for bidirectional prediction. Before conducting bidirectional prediction, we need to preprocess the image by dividing the focus stack image into blocks and recombining them into block pairs. This preprocessing process can be achieved through image segmentation algorithms and block pair combination strategies. For each block pair, we can perform bidirectional prediction to obtain prediction parameters. These prediction parameters can be determined based on specific prediction models and algorithms, such as the block based Gaussian Wiener bidirectional prediction model. In the bidirectional prediction module, block pairs will be used to fit the best bidirectional fitting parameters, and based on this, the prediction generation parameters will be solved. By applying the information of prediction generation parameters and block pairs, we can generate new prediction blocks. Finally, by concatenating all prediction blocks according to their positions in the image, new prediction focus stack images can be obtained. Result This experiment is generated on 11 sparse focal stack images, with evaluation metrics using Peak Signal to Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM). The average PSNR of the 11 sequence generated results is 40.861 dB, and the average SSIM is 0.976. Compared to the two comparison methods of generalized Gaussian and spatial coordinates, PSNR has improved by 6.503 dB and 6.467 dB respectively, and SSIM has improved by 0.057 and 0.092 respectively. The average PSNR and SSIM of each sequence have improved by at least 3.474dB and 0.012, respectively. Conclusion The experimental results show that the method proposed in this paper outperforms both subjective and objective comparison methods, and has good performance on 11 different scene sequences. Combined with ablation experiments, the advantages of bidirectional prediction in our method have been demonstrated. Based on the above results, it can be concluded that the bidirectional prediction method proposed in this article can effectively generate new focal pile images and play a crucial role in visual applications targeting various depths of field.
Keywords
|