面向轻量级深度伪造检测的无数据模型压缩
摘 要
目的 尽管现有的深度伪造检测方法已在各大公开数据集上展现出了极佳的真伪鉴别性能,但考虑到运行过程中耗费的巨大内存占用和计算成本,如何实现此类模型的在线部署仍是一个具有挑战性的任务。对此,本文尝试利用无数据量化的方法开发轻量级的深度伪造检测器。方法 在保证准确率损失较少的前提下,对提前训练好的高精度深度伪造检测模型进行压缩处理,不再使用32 bit浮点数表示模型的权重参数与激活值,而是将其全部转化为低位宽的整型数值。此外,由于人脸数据涉及隐私保护问题,本文中所有的量化操作都是基于无数据场景完成的,即利用合成数据作为校准集来获取正确的激活值范围。这些数据经过不断优化迭代,完美匹配了存储在预训练模型各批归一化层中的统计信息,与原始训练数据具备非常相似的分布特征。结果 在两个经典的人脸伪造数据集FaceForensics++和Celeb-DF v2上,4种预先训练好的深度伪造检测模型ResNet50、Xception、EfficientNet-b3和MobileNetV2经过所提方法的量化压缩处理后,均能保持甚至超越原有的性能指标。即使当模型的权重和激活值被压缩为6 bit时,所得轻量级模型的最低检测准确率也能达到81%。结论 通过充分利用蕴含在深度伪造检测预训练模型中的有价值信息,本文提出了一种基于无数据模型压缩的轻量级人脸伪造检测器,该检测器能够准确高效地识别出可疑人脸样本的真实性,与此同时,检测所需的资源和时间成本大幅降低。
关键词
Data-free model compression for light-weight DeepFake detection
Zhuo Wenqi1,2, Li Dongze1,2, Wang Wei2, Dong Jing2(1.School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China;2.Center for Research on Intelligent Perception and Computing, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China) Abstract
Objective Deep generative models-based human facial images and videos analyses have been developing in recent years. To cope with the faked issues effectively, a novel DeepFake detection (DFD) technique has emerged. Multiple DFD methods are yielded the detector to discriminate between the real and fake faces analysis with over 95% precision. However, it is still a great challenge to deploy them online because of the memory and computational cost. So, we develop a quantified model to DFD domain. Quantization-related model compression can be used to optimize model size through converting a model’s key parameters from high precision floating points into low precision integers. However, the degradation issue is still being challenged. To resolve degradation problem, it can be segmented into 2 categories: 1) quantification-oriented fine-tuning and 2) post-training quantification. To optimize cost effective, the latter one is optioned to develop a light-weight DFD detector. In addition, to clarify the privacy concerns and information security, data-free scenario-oriented models-quantified are constructed and optimized with no prior training set. Method The proposed framework consists of 2 steps: 1) key parameters-related quantification and 2) activation-ranged calibration. First, the weights and activations of a well-trained high accuracy DFD model are optioned as the target parameters to be quantified. A linear transformation-asymmetric is used to convert them from 32-bit floating points into lower bit-width representation like INT8 and INT6. Next, the activation-ranged errors are validated based on calibration set. For data-free scenario, it is challenged to collect data from prior training set. Therefore, to produce more effective calibration data, a batch of normalization layers of a pre-trained DFD model is tailored to guide the generator. Such statistics knowledge is often used to reflect the distribution of training data like running-relevant means and variances. We can optimize the input data of those are sampled in random from a standard Gaussian distribution in terms of our L2-norm constraint. Furthermore, to reduce the accuracy loss, the ReLU6 are employed to optimize its activation function for all DFD models. The interval [0, 6] is introduced to ReLU6 as a natural range for activations, which is beneficial to the quantification. The data-coordinated is fed into the quantified model and the activation-ranged is calibrated during the inference-forward process. Result our proposed scheme is tested with popular multi-DFD models of those are ResNet50, Xception, EfficientNet-b3 and MobileNetV2 in relevant to the deepfake datasets-popular like FaceForensics++ and Celeb-DF v2. For FaceForensics++, the Xception and MobileNetV2 achieve Acc scores of 93.98% and 92.25% in W8A8 quantitatively and optimized by 0.01% and 0.92% to benchmark. The detection accuracy of ResNet50 is reached to 92.56% in W6A8. The performance of EfficientNet-b3 is required to be resolved and calibrated further. For Celeb-DF v2, each MobileNetV2 precision gains in W8A8, W8A6 and W6A6 are improved by 0.07%, 0.77% and 0.09% of each compared to its benched model. For 3 sorts of DFD models-relevant, the detection accuracy of their quantified versions is higher than 92%, even in W6A6 quantization. In comparison with a similar work “DefakeHop”, it can construct a DFD-featured light-weight network as well. For the quantified DFD models, they can get higher scores of AUC(area under the ROC curve) on public datasets although the parameter amount is unchanged and larger than DefakeHop. Actually, to make DFD models more light-weight, we can use the proposed scheme to compress DefakeHop. To evaluate our approach better, a series of ablation experiments are carried out to analyze the bit-width setting of weights and activation-derived quantification impact, the type of calibration data, and activation function as well. Conclusion The model-compressed methods are melted into DFD tasks and a data-free post-quantization scheme is also developed. It can convert a pre-trained DFD model into light-weight. Experiments are implemented on FaceForensics++ and Celeb-DF v2 with a range of typical DFD models, including ResNet50, Xception, EfficientNet-b3 and MobileNetV2. The quantified DFD models can be customized to recognize fake faces accurately and efficiently. Future research direction is potential to assign the DFD models online or on some resource-constrained platforms like mobile and edge devices.
Keywords
DeepFake detection fake face model compression low bit-width representation data-free distillation light-weight model
|