观看经度联合加权全景图显著性检测算法
摘 要
目的 目前针对全景图显著性检测的研究已经取得了一定成果,但在全景图像的位置特性问题中,大都仅探讨了纬度对全景图显著性检测的影响。而人们观看全景图像时,因视角有限,不同经度位置的显著性也有很大差异,从而导致预测的显著区域往往不够精确。为此本文以全景图的经度位置特性为出发点,提出基于观看经度联合加权的全景图显著性检测算法。方法 使用空间显著性预测网络得到初步的显著性图像,使用赤道偏倚进行预处理以改善不同纬度位置的显著性检测效果。接着对显著性图像进行注视点经度加权,将观察者观看全景图的行为习惯与显著性图像相结合。之后对全景图进行双立方体投影与分割,提取全景图的亮度与深度特征,进而计算不同视口经度权重。经过两次加权,得到最终的显著性图像。结果 在Salient360!挑战大赛提供的数据集上与其他几种算法进行了实验比较。结果显示,本文算法能得到很好的显著性检测结果。在对本文算法的通用性能的测试中,在标准化扫描路径显著性、相关系数、相似度与相对熵指标上分别达到了1.979 3、0.806 2、0.709 5和0.323 9,均优于其他算法。结论 提出的全景图显著性检测算法解决了以往全景图显著性检测中不同经度位置检测结果不够准确的问题。
关键词
Saliency detection algorithm of panoramic images using joint weighting with observer’s attention longitude
Sun Yao, Chen Chunyi, Hu Xiaojuan, Li Ling, Xing Qiwei(School of Computer Science and Technology, Changchun University of Science and Technology, Changchun 130022, China) Abstract
Objective Considerable development in immersive media technologies has taken place with the aim of providing a complete audiovisual experience to the users, especially sense of being in the visualized scene. It has been used in many fields such as entertainment, tourism, exhibition, etc. The image resolution of virtual reality (VR) panoramic images is much higher than that of traditional images, making the storage and transmission of VR panoramic images very difficult. However, a human’s visual attention mechanism has a selective attention ability, and when faced with a scene, the human can automatically deal with the area of interest, selectively ignoring the area of no interest. In daily tasks, humans face far more information than they can handle, and selective visual attention enables them to process a large amount of information by prioritizing certain aspects of the information while ignoring others. Therefore, it is necessary to detect the saliency of panoramic images to reasonably reduce the redundant information in it. For the saliency detection of panoramic images, current research can be divided into the following directions: 1) improved traditional saliency detection algorithm and 2) panoramic image saliency algorithm by deep learning. The improved traditional saliency detection algorithm involves two aspects: projection conversion and equator bias. According to the characteristics of VR panoramic images with multiple projection modes, the saliency detection of VR panoramic images can be used in different projection domains. Equator bias refers to a phenomenon that the saliency of panoramic images tends to be concentrated near the equator because of human observation habits. The saliency detection algorithm can weigh the saliency according to the latitude position of pixels. The panoramic image saliency algorithm for deep learning uses neural networks to extract image features and detect the image’s saliency. It is also necessary to improve the saliency detection effects when using the neural network algorithm, such as combining equator bias because of insufficient contents in the current panoramic image dataset. Although the existing algorithm optimizes the influence of latitude location attributes by combining equator bias, no research has focused on the influence of longitude location attributes on saliency. Hence, this study proposes a saliency detection algorithm of panoramic images using joint weighting with the observer’s attention longitude. Method First, a spatial saliency prediction network is used to obtain the preliminary saliency images, and then the equator bias is used to increase the accuracy of the saliency detection at different latitudes. The saliency image is weighted by the attention longitude weighting to combine the observer’s behavior with saliency image. This study first adds up the saliency value of each longitude in the reference saliency images in the dataset to obtain the prime attention longitude weight graph. Then, the center of prime attention longitude weight graph is aligned with the prime observation center of the original panorama by translating the prime attention longitude weight graph. The weight of the prime attention longitude weighting is multiplied by the value of the saliency. A strong salient area out of the prime observation viewport is observed, the most salient part of the predicted panorama saliency image is used as the secondary observation center, and the converted attention longitude weighting will work. There are two differences between the prime attention longitude weighting and the converted attention longitude weighting. One is that the datasets they use are different, and we choose images more similar to human viewing habits to get the converted attention longitude weighting. The other one is that their effect is different, and the converted attention longitude weighting’s effect is weaker than the other. The second step is “weighting of different viewports and longitude”. First, the panoramic image will be double-cube projected, and the panoramic image in ERP (equirectangular projection) format was cube projected into six squares. Then, it will be translated for 45 degrees and use cube projection will be used again. Then, the RGB format image is converted into LAB format to extract the brightness feature of the panoramic image and the mrharicot-monodepth2 is used to obtain the depth feature. The different longitude weights of each viewport were calculated based on the difference between the features of each viewport and other viewports, and the longitude weights of each pixel point were calculated based on the difference between the features of each pixel point and other pixels. Combined with the two weights, the different longitude weights of the viewport were obtained and used to weigh the saliency image. Finally, by using the saliency graph with the prime attention longitude weighting and the weighting of different viewports and longitude, we can obtain the final saliency graph. Result This study compared our result with other algorithms’ results on a dataset. The dataset is provided by the International Conference on Multimedia & Expo(ICME) 2017 Salient360! Grand Challenge. Other algorithms include “a saliency prediction model based on sparse representation and the human acuity weighted center-surround differences” (CDSR), “deepauto encoder-based reconstruction network” (AER), and “panoramic-CNN-360-Saliency” (PC3S) algorithm. CSDR is an improved traditional algorithm, and AER and PC3S are deep learning algorithms. For evaluation, the evaluation metrics we use various evaluation metrics for eye fixation prediction, including the saliency of the standardized scan path, correlation coefficient, similarity, and Kullback-Leibler (KL) divergence, and reached 1.979 3, 0.806 2, 0.709 5, and 0.323 9, respectively. The results show that the proposed algorithm is superior to other algorithms in the four evaluation metrics, and the detection results of saliency are better, and the detection accuracy of saliency at different longitude positions is more accurate. Conclusion In this study, we proposed a saliency detection algorithm of panoramic images using joint weighting with observer’s attention longitude. This algorithm improves the effect of saliency detection at different longitude position’s accuracy. Experiments show that our algorithm is superior to the current algorithms, especially the detection accuracy of saliency at different longitude is improved.
Keywords
saliency detection panoramic image attention longitude weighting double-cube projection weighting of different viewports and longitude
|