语义一致性引导的多任务拼接篡改检测
摘 要
目的 随着数字图像及编辑软件的广泛应用,伪造图像层出不穷,对新闻传播、法律取证等行业造成了影响。拼接伪造是一种常见的伪造方式,这种伪造方式往往会向原始图像中添加新的对象,导致原始图像语义受到改变、曲解。现有很多基于卷积神经网络的篡改检测方法都更关注篡改痕迹的特征提取,但忽略了伪造图像中的语义不一致。针对拼接伪造中原始图像发生的语义变化,提出了一种以篡改检测为主任务,语义分割和噪声重建为辅助任务的多分辨率全卷积神经网络。方法 通过多任务策略将语义分割和噪声重建作为辅助任务。语义分割任务旨在捕捉拼接伪造图像过程中产生的语义不一致现象,噪声重建任务允许网络获得更全面的图像噪声分布。为了使网络获取更全面、准确的特征,网络中的RGB流、噪声流和融合模块都使用多分辨率思想从多个分辨率上提取处理不同形状和大小的拼接对象。结果 本文与其他几种先进的篡改检测网络和基于HRNet(high-resolution network)的基线网络进行了对比实验,在Fantastic Reality和Spliced Dataset两个数据集中,本文方法均取得了最优性能,F1分数分别为0.946和0.961。对JPEG(joint photographic experts group)压缩、亮度调节、对比度调节和添加噪声进行鲁棒性实验,结果表明,本文方法针对常见的图像后处理手段具有良好的鲁棒性。结论 提出的语义一致性引导的多任务多分辨率拼接篡改检测网络检测更加准确,具有良好的鲁棒性,拓展了数字图像取证研究新思路。
关键词
Semantic consistency-relevant multitask splicing-tampered detection
Zhang Yulin, Wang Hongxia, Zhang Rui, Zhang Jingyuan(School of Cyber Science and Engineering, Sichuan University, Chengdu 610065, China) Abstract
Objective Forensics-oriented digital faked images and its editing and modification software have been emerging nowadays. To fake and misinterpret semantics of the original image, forgery-spliced is a commonly-used method in terms of new instances modification to the original image. Conventional methods are mainly concerned about the statistical information and physical features of the image itself in terms of convolutional neural network based (CNN-based) anomaly detection of forged images like edge features and noise features. But, it is still challenged for its semantic inconsistencies. In addition, image-tampered detection is challenged for human-behavioral image post-processing like compression or image filters. Method To detect images-forged splicing, semantic segmentation and noise reconstruction are used for CNN and multi-resolution-based detection. Our network-proposed consists of 4 aspects as mentioned below: 1) RGB stream, 2) noise stream, 3) fusion module, and 4) multi-task module. The RGB stream is used to extract the boundary-tampered artifacts and its semantic information. To extract the noise features of the forged regions, a filter layer-based steganalysis is used because the RGB and noise information can offer multifaceted forgery detection. The semantic segmentation task is oriented to capture the semantic inconsistencies. The noise reconstruction task can yield the network to obtain a more diversified image noise distribution; and the forgery detection task is used to locate the tampered regions. Similar to recent multi-task networks-popular, a discrete loss function is used as well, and the sum of the loss functions for each task is regarded as the overall loss function of the network. To enhance the spatial co-occurrence of the two features further, the RGB and noise stream-derived fusion module can be used to fuse the features before the features are melted into the forgery detection task. Additionally, to obtain more complicated and accurate features, the multi-resolution pathway is implemented to the RGB streams, noise streams and feature fusion modules in the network. To enhance the network’s ability, multi-resolution pathway is tailored to perceive semantic and precise location information, and it is beneficial to location-oriented forgery detection tasks. Result The comparative experiments are carried out based on 6 tamper detection networks of those are 1) manipulation tracing network(ManTra-Net), 2) coarse to refined network(C2Rnet), 3) multi-task wavelet corrected network(MWC-Net), 4) compression artifact tracing network(CAT-Net), 5) ringed residual U-Net(RRU-Net), and 6) high-resolution network(HRNet)-based baseline networks on Fantastic Reality and Spliced Dataset. Model training and testing are equipped with Intel Core i7-9700k CPU and NVIDIA GeForce RTX2080Ti GPU. During training, stochastic gradient descent with a momentum of 0.9 is used as the optimizer with an initial learning rate of 0.005 and an exponential decay. The F1 scores on Fantastic Reality and Spliced Dataset are 0.946 and 0.961 of each. For temporal comparison experiment, our optimization is effective for balancing computational cost and network ability. The commonly-regular compression is in relevant to JPEG, whereas the image filters are used to adjust its contrast pairs and brightness. Therefore, to meet its natural scenario requirement, we design robustness experiments on the Fantastic Reality dataset based on 4 sorts of human-behavioral image post-processing methods of JPEG compression, contrast, brightness and noise distortion adjustment. Conclusion To detect forged regions effectively and accurately, a semantic consistency-relevant multi-task and multi-resolution tampering detection network is demonstrated. The multitask strategy is implemented to extract certain semantic features and detect forgery regions in terms of semantic inconsistencies in forged images, while the multi-resolution network enables the network to obtain more diversified image information. Furthermore, robustness-based experiments demonstrate that our network-robust has its potentials for JPEG-compressed image post-processing.
Keywords
image tampering detection semantic consistency multi-task strategy multi-resolution high-resolution network(HRNet)
|