视觉深度伪造检测技术综述
摘 要
随着生成式深度学习算法的发展,深度伪造技术发展并应用于各个领域。深度伪造技术的滥用使人们逐渐意识到其带来的威胁,伪造检测技术随之而生。本文基于视觉深度伪造技术研究进行综述。1)简要介绍了视觉深度伪造技术的发展历程及技术原理,包括生成对抗网络在深度伪造制品中的应用;2)对现有的视觉深度伪造数据集进行汇总并归类;3)对目前的视觉深度伪造检测技术进行了分类,将现有的检测方法归纳为基于具体伪影的、基于数据驱动的、基于信息不一致和其他类型视觉深度伪造检测等4种分类。其中,基于伪影的检测方法着重于寻找伪造制品与真实图像之间的像素级差异,通过机器学习识别深度伪造制品中的人工伪影痕迹,基于信息不一致的方法则着重于寻找伪造制品与真实图像或视频之间的信息级差异,这两种方法都具有识别效率高、训练便捷等优点;基于数据驱动的方法通过大量的数据集和机器学习训练,直接使用神经网络本身对深度伪造制品进行训练,并通过改善网络架构增进模型以提高训练效率,因为其模型的多变和高精确率成为目前深度伪造检测的热门方向。同时,本文分析了4种方法的具体优缺点,并进一步给出了未来视觉深度伪造检测研究的重点和难点。
关键词
An overview of visual DeepFake detection techniques
Wang Renying, Chu Beilin, Yang Zhen, Zhou Linna(School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing 100876, China) Abstract
The word "DeepFake" is an integration of the two words "deep learning" and "fake", mainly relates to artificial neural networks. This research summarizes Deepfake technique based on the visual depth forgery techniques. This review has evolved the aspects as shown below:1) the history and technical principles of visual deep forgery techniques, including the application of generative countermeasure networks in deep forgery products. The current visual depth forgery methods can be toughly divided into three types:synthetic new face, face modification and face swapping. The method called new face synthesis uses some powerful generation of confrontation networks to generate overall non-existent face images completely. Currently, the popular databases of the new face synthesis technique were generated based on ProGAN and StyleGAN. Each forgery image generated will conduct its own specific generative adversarial network(GAN) fingerprint. The face modification method means to add some facial modifications to the target face. For instance, it can change one's hair color or skin color. It can also modify the gender of the targeted person or add a pair of glasses to the targeted person. This method uses GAN to generate images. The latest StarGAN database can divide the face into multiple areas and modify them simultaneously. The face swap method consists of two parts. The first part is using another person's face to replace the target person in the video. This way is used in the most popular algorithm in visual depth forgery currently, such as DeepFake and FaceSwap. The second part is facial expression exchange, which is also called face reproduction. Face reproduction means replacing one's facial expressions with the facial others, such as changing Obama's expressions and actions to complete a fake "speech". At present, Face2Face and NeuralTextures become popular in visual deep forgery via using face reproduction. Meanwhile, there are some mobile applications can also make fake information in faces. FaceApp which is based on StarGAN is modified various emotional expressions. 2) The current visual deep forgery datasets are summarized and classified. The deep forgery datasets are constantly developing with the improvement of deep forgery techniques and deep forgery detection techniques. This review collects the deep forgery datasets that have received widespread attention recently and puts them together in a demonstrated table to reveal the advantages and disadvantages. 3) The current visual deep forgery detection techniques are segmented. Current deepfake detection methods and models are summarized into four classifications in this overview. The DeepFake detection has relied on specific artifacts, data-driven, inconsistent information and other types of visual depth forgery detection. The overview divided these four types of DeepFake detection methods into more subcategories. The DeepFake subdivided detection method is related to five subcategories based on artifacts, including the fake face blending frame, the artifacts in the middle area of the fake face, the color inconformity of the deep fake products, the artifact inconformity of light source and GAN fingerprints. The review subdivided the data-driven The detection method that attempts to locate the location of a tampered area, and the detection method based on an improved neural network architecture. The classification in DeepFake detection methods based on inconsistent information was devided into three parts based on the aspects as shown such asinconsistent biological signals, inconsistent time series, and detection methods based on inconsistent behavior with real targets. Among the four classfications of DeepFake detection techniques, the one detection method based on artifacts focuses more on the findings of the pixel-level difference between the fake products and real images and vedios. These methods paid more attention to finding discoverable arifacts which made by GANs. On the other hand, the method based on the inconsistency of information focuses on finding information-level differences between fake products and real products. These methods have the advantages of high recognition efficiency and convenient training. While the data-driven method uses various DeepFake datasets and real datas and based on machine learning training to directly use the neural network itself to identify fake products. This overview analyzes the unique advantages and disadvantages of the four classifications and implements visual depth forgery detection. This research has its contributions as shown below:1) an understanding of the DeepFake generation technology and emergingDeepFak detected method for readers, 2) inform readers of the latest developments, trends and challenges in DeepFake study these years and 3) identify the attacker-defender of the latest trends in the future development of DeepFake and strive to yield priority to the DeepFake detection.
Keywords
|