Current Issue Cover
边缘加强的超高清视频质量评估

滕建新1, 何杰锋1, 袁锦春1, 邢凤闯2, 王捍贫2(1.广州市广播电视台, 广州 510310;2.广州大学计算机科学学院, 广州 510006)

摘 要
目的 随着网络和电视技术的飞速发展,观看4 K(3840×2160像素)超高清视频成为趋势。然而,由于超高清视频分辨率高、边缘与细节信息丰富、数据量巨大,在采集、压缩、传输和存储的过程中更容易引入失真。因此,超高清视频质量评估成为当今广播电视技术的重要研究内容。本文提出了一种边缘加强的超高清视频质量评估方法。方法 对输入视频的每一帧进行拆分处理,利用边缘检测算子对R、G、B三通道的图像分别进行边缘检测,合并R、G、B三通道的边缘信息得到视频帧的边缘图像。设计边缘掩蔽、内容依赖和时域记忆3个网络模块分别提取相应的特征,将特征输入到全连接层中进行降维处理后获得质量特征,基于质量特征计算输入视频的视频质量分数。由于超高清视频具有丰富的边缘,边缘细节清晰度极高,因此在边缘处引入的失真通常较为明显,而本文提出的边缘加强方法特别适用超高清视频的质量评估。同时由于提出的方法引入了内容依赖和时域迟滞特性,因此也同时适用其他野生视频的质量评估。结果 实验在包括超高清在内的4个视频质量评估数据集上进行,与5种主流方法进行比较,结果表明提出的方法性能优越。在KoNViD-1K、DVL2021、LIVE-Qualcomm、LSVQ据集上,与当前性能最好的方法相比,SROCC(Spearman rank-order correlation coefficient)指标分别提升了3.9%、4.2%、10.0%和0.6%,PLCC(Pearson’s linear correlation coefficient)指标分别提升了3.9%、2.2%、10.1%和0.1%。结论 本文方法结合超高清视频的特点,更好地拟合了人眼视觉特性,获得了当前最好的性能;同时由于未使用光流方法,大幅减少了计算量,获得了很好的泛化能力。
关键词
Edge-enhanced ultra high definition video quality assessment

Teng Jianxin1, He Jiefeng1, Yuan Jinchun1, Xing Fengchuang2, Wang Hanpin2(1.Guangzhou Broadcasting Network, Guangzhou 510310, China;2.School of Computer Science, Guangzhou University, Guangzhou 510006, China)

Abstract
Objective The 4 K (3840×2160 pixels) ultra high definition (UHD) video has been developing intensively in terms of emerging network and television technology. However, in respect of acquisition, compression, transmission and storage, the distortion-acquired issue is challenged due to the huge amount of UHD video data, rich edge and texture information, and high resolution. Our research is focused on an edge-enhanced UHD-VQA method because UHD-based video quality assessment (VQA) has become a crucial research domain in television broadcasting. Method First, the input video frame is splitted to obtain 3 kinds of channels: 1) R, 2) G, and 3) B. Then, the edge detection operator is used to detect the edge information for each channel. The edge information of R, G and B channels is coordinated and the edge map of the video frame is obtained. To extract the spatial information of the video, human visual system (HVS) is targeted to develop its content-oriented. To extract the spatial information of the video frame further, each frame is input into the ImageNet-1K-trained ResNet-50. To reduce the dimension of features, a global pooling-derived feature maps are concatenated on 3 aspects as mentioned below: 1) the feature maps is extracted and processed via recurrent unit-gated, 2) the min pooling and softmin pooling are used to process the features output, and 3) it is obtained and the prediction score can be calculated in terms of a sum of the weighted value. To extract multiple features, the masking-edged, content-oriented, and memory-temporal network modules are designed. Finally, to obtain the quality features and its video quality score-calculated, the features are melted into the fully connected layer network for dimensionality reduction. Due to the high definition and rich of edge details of UHD video, it is more likely to cause severe distortion at the edge. So, our edge-enhanced method can be adapted to the quality assessment of UHD video specially. At the same time, due to the introduction of content-oriented and time-domain hysteresis features, our method has its potentials for the quality assessment of more outdoor-relevant videos. Result Experiments are compared to 5 popular methods on 4 datasets. We optimize some values on the 4 aspects: 1) 3.9% SROCC (Spearman rank-order correlation coefficient) improvement and 3.9% PLCC (Pearson’s linear correlation coefficient) improvement on KoNViD-1K, 2) 4.2% SROCC improvement and 2.2% PLCC improvement on DVL2021, 3) 10.0% SROCC improvement and 10.1% PLCC improvement on LIVE-Qualcomm, and 4) 0.6% SROCC improvement and 0.1% PLCC improvement on LSVQ. To demonstrate its generalization ability, a cross-dataset experiment is carried out as well. Furthermore, to optimize the effectiveness of edge information, we conduct an ablation study as well. Our illustrated network can be actually trained well to match the feature of edge masking without edge masking. Conclusion To alleviate the edge-distorted, an edge-enhanced method is demonstrated to assess the quality of UHD video. At the same time, the content-oriented and time-domain hysteresis features are introduced to resolve the coordinated UHD-VQA problem. To detect edge information of video frames, the Canny operator is used and its configuration is sorted out. The training parameters are used to deal with the Heterogeneity problem in multiple video datasets. To verify the effectiveness of the proposed method, a large number of experiments are tested and compared to 4 popular video quality evaluation datasets (UHD included). The performance can be improved and reached to 10.0%, and the smallest performance is gained 0.1% as well. These experimental results show that edge information can optimize the performance of VQA methods greatly. The computational cost is optimized greatly since the optical flow method is not used. The future research direction can be predicted and concerned about more potential HVS features for the NR-VQA problem.
Keywords

订阅号|日报