图像与视频质量评价综述
摘 要
图像/视频的获取及传输过程中,由于物理环境及算法性能的限制,其质量难免会出现无法预估的衰减,导致其在实际场景中的应用受到限制,并对人的视觉体验造成显著影响。因此,作为计算机视觉领域的一项重要任务,图像/视频质量评价应运而生。其目的在于通过构建计算机数学模型来衡量图像/视频中的失真信息以判断其质量的好坏,达到自动预测质量的效果。在城市生活、交通监控以及多媒体直播等多个场景中具有广泛的应用前景。图像/视频质量评价研究取得了长足的发展,为计算机视觉领域中其他任务提供了一定的便利。本文在广泛调研前人研究的基础上,回顾了整个图像/视频质量评价领域的发展历程,分别列举了传统方法和深度学习方法中一些具有里程碑意义的算法和影响力较大的算法,然后从全参考、半参考和无参考3个方面分别对图像/视频质量评价领域的一些文献进行了综述,具体涉及的方法包含基于结构信息、基于人类视觉系统和基于自然图像统计的方法等;在LIVE(laboratory for image & video engineering)、CSIQ(categorical subjective image quality database)、TID2013等公开数据集的基础上,基于SROCC(Spearman rank order correlation coefficient)、PLCC(Pearson linear correlation coefficient)等评价指标,对一些具有代表性算法的性能进行了分析;最后总结当前质量评价领域仍存在的一些挑战与问题,并对其进行了展望。本文旨在为质量评价领域的研究人员提供一个较全面的参考。
关键词
The critical review of image and video quality assessment methods
Cheng Ruqiu1, Yu Ye1,2, Shi Daizong1, Cai Wen1(1.School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230009, China;2.Anhui Province Key Laboratory of Industry Safety and Emergency Technology, Hefei 230009, China) Abstract
Images and videos quality has a great impact on the information acquisition of human behavior visual system. Image/video quality assessment (I/VQA) can be regarded as a key factor for multimedia and network video services nowadays. The quality assessment methods are mainly segmented into qualitative and quantitative models. Qualitative methods conduct quality assessment based on the human's eyes, which cost lots of manpower and time consuming resources. Quantitative quality assessment simulates human observation and it can automatically forecast input quality for those of I/VQA researchers. Our review is critical reviewed quality assessment(QA) methods like full reference (FR), reduced reference (RR) and no reference (NR) methods, which are categorized based on the "clean" data capability (the reference image with no distortion). The full reference methods assess the quality of the distorted images with comparison of the "clean" data. The pros of these methods are better performance, low complexity and good robustness. The cons are that the reference images are challenged to obtain in the in-situ scenarios. Compared with the full reference methods, the reduced reference methods reduce the number of the reference images based on featured reference data to predict the quality. All reference data are not required for the no reference methods. Traditional IQA methods are mainly based on structural similarity, human visual system (HVS) and natural scene statistical theory (NSS). The structure similarity based methods assesses the measurement quality derived of the structural information changes; the HVS based methods are based on some human eyes features; Natural scene statistics based methods fit the transformed coefficients distribution of images or videos and compare the gap of reference coefficients and test coefficients. The emerging deep learning methods based on convolutional neural network (CNN) extract image features through convolutional operations and implement logistic regression to update the models. The learning capability of the IQA-oriented CNN has its priority. VQA models are mainly divided into two categories. One category is that the temporal video quality can be obtained based on the IQA methods for single frame, and then integrate the quality of all frames. The integration methods are mainly divided into general average and weighted average methods in which the weights can be obtained in terms of manual setting or learning. The other one is forthe three dimensions (3D) video as mentioned below:First, extracting the coefficient distribution by 3D transformation or using 3D CNN to extract features, and then fitting the coefficient distribution or mapping the features to obtain the final quality. Compared to traditional methods, learning-based methods have higher complexity, but better performance. Most of the current VQA methods also use CNN as the backbone structure, which assists in the overall model construction. Our critical review analyzes the growth of I/VQA and lists some representative algorithms. Then, we review some I/VQA based literatures from two aspects, including traditional methods and deep learning-based methods, respectively. The capability of representative algorithms is analyzed derived of Spearman rank order correlation coefficient(SROCC) and Pearson linear correlation coefficient(PLCC) evaluation indexes in terms of laboratory for image & video engineering(LIVE), categorical subjective image quality database(CSIQ), TID2013 and other datasets. Finally, the challenging issues of quality assessment are summarized and predicted.
Keywords
image/video quality assessment(I/VQA) structural information human vision system(HVS) natural scene statistics(NSS) deep learning
|