Current Issue Cover
边缘引导的双注意力图像拼接检测网络

吴晶辉1, 严彩萍1, 李红2, 刘仁海1(1.杭州师范大学信息科学与技术学院, 杭州 311121;2.杭州启源视觉科技有限公司, 杭州 311121)

摘 要
目的 伪造图像给众多行业埋下了隐患,这会造成大量潜在的经济损失。方法 提出一种边缘引导的双注意力图像拼接检测网络(boundary-guided dual attention network,BDA-Net),该网络通过将空间通道依赖和边缘预测集成到网络提取的特征中来得到预测结果。首先,提出一种称为预测分支的编解码模型,该分支作为模型的主干网络,可以提取和融合不同分辨率的特征图。其次,为了捕捉不同维度的依赖关系并增强网络对感兴趣区域的关注能力,设计了一个沿多维度进行特征编码的坐标—空间注意力模块(coordinate-spatial attention module,CSAM)。最后,设计了一条边缘引导分支来捕获篡改区域和非篡改区域之间的微小边缘痕迹,以辅助预测分支进行更好的分割。结果 实验使用4个图像拼接数据集与多种方法进行比较,评价指标为F1值。在Columbia数据集中,与排名第1的模型相比,F1值仅相差1.6%。在NIST16 Splicing(National Institute of Standards and Technology 16 Splicing)数据集中,F1值与最好的模型略有差距。而在检测难度更高的CASIA2.0 Splicing(Chinese Academy of Sciences Institute of Automation Dataset 2.0 Splicing)和IMD2020(Image Manipulated Datasets 2020)数据集中,BDA-Net的F1值相比排名第2的模型分别提高了15.3%和11.9%。为了验证模型的鲁棒性,还对图像施加JPEG压缩、高斯模糊、锐化、高斯噪声和椒盐噪声攻击。实验结果表明,BDA-Net的鲁棒性明显优于其他模型。结论 本文方法充分利用深度学习模型的优点和图像拼接检测领域的专业知识,能有效提升模型性能。与现有的检测方法相比,具有更强的检测能力和更好的稳定性。
关键词
BDA-Net:boundary-guided dual attention network for image splicing detection

Wu Jinghui1, Yan Caiping1, Li Hong2, Liu Renhai1(1.School of Information Science and Technology, Hangzhou Normal University, Hangzhou 311121, China;2.Hangzhou Insvision Technology Co., Ltd., Hangzhou 311121, China)

Abstract
Objective The rapid development of the internet and the proliferation of effective and user-friendly picture editing software have resulted in an explosion of modified images on the internet.Although these modified images can bring some benefits(e.g.,landscape beautification and face photo enhancement),they also have many negative effects on people's lives,such as falsified transaction records,published false news and fake evidence in court.Maliciously exploited tampered images can cause immeasurable damage to individuals and society.Recent studies on image splicing detection have demonstrated the effectiveness of convolutional neural networks in improving localization performance.However,they have generally ignored the multiscale information fusion,which is essential for locating tampered regions of various sizes.Moreover,the performance of most existing detection methods is unsatisfactory.Therefore,we need to design a good splicing image detection method.Method In this study,we propose a novel boundary-guided dual attention network(BDA-Net) by integrating spatial-channel dependency and boundary prediction into the features extracted by the network.In particular,we present a new encoder-decoder model named prediction branch to extract and fuse feature maps with different resolutions.This model constitutes the backbone of BDA-Net.A coordinate-spatial attention module(CSAM) is designed and embedded into the deep layer of feature extraction to capture long-range dependencies.In this way,the representations of interested regions can be augmented.Moreover,the computational complexity is limited by aggregating features with three one-dimensional encodings.In addition,we present a boundary-guided branch to capture the tiny border artifacts between tampered and non-tampered regions and it is modeled as a binary segmentation task to enhance the detailed prediction of our network.A multitask loss function is designed to constrain the network.The loss function consists of two parts,one is the pixel level localization loss function,the other is the boundary loss function.The localization loss function is composed of weighted cross-entropy loss function and Dice loss function.In the tampered image,the proportion of tampered area and non-tampered area is not the same.The proportion of the tampered region is smaller than that of the non-tampered region,which will cause the problem of sample imbalance.The weighted cross-entropy loss function can set different weights for different training samples and improve the model's focus to the training samples with high weights.The Dice loss function pays attention to the pixel-level similarity between the predicted results and the real results.In the case of class imbalance,the weight value can be adjusted adaptively to improve the accuracy and robustness of the segmentation model.The boundary loss function is composed of Dice loss function.Boundary label are used to guide the network to predict the splicing boundary of a tampered image.In the boundary label,the number of boundary pixels is much smaller than the number of non-boundary pixels,which can lead to an imbalance of class.This phenomenon is especially evident in high-resolution images.Therefore,using the Dice loss function as boundary loss function is very helpful for model to learn features from extremely unbalanced data.The network is implemented in the PyTorch 2.0 framework.The input images and ground-truth maps are resized to 500 × 500 pixels for training.At the same time,adam optimization algorithm is used to optimize the model.The initial learning rate of the model is set to 1E-4,and the learning rate scheduler is the Cosine Annealing WarmRestarts learning rate scheduler.Batch size is set to 2.Result We use four image splicing datasets in our experiments:Columbia dataset,NIST16 splicing dataset(National Institute of Standards and Technology 16 Splicing),CASIA2.0 splicing dataset(Chinese Academy of Sciences Institute of Automation Dataset 2.0 Splicing) and IMD2020 dataset(Image Manipulated Datasets 2020).All of the spliced images in the Columbia dataset were created using real images,without any post-processing,with high resolution and uncompressed.The NIST16 dataset is a very challenging provided by the National Institute of Standards and Technology.CASIA 2.0 dataset is a popular image tamper detection dataset with rich and clear image content.The IMD2020 dataset contains 2 010 real images downloaded from the internet and corresponding labels.We choose four detection methods based on deep learning to compare the performance of the proposed BDA-Net.They are U-Net,DeepLab V3+(deep lab V3+),RRU-Net(ringed residual U-Net) and MTSE-Net(multi-task SE-network).U-Net is a classical semantic segmentation model,which can be applied to many tasks.DeepLab V3+ combines the spatial pyramid pool module with the encoder-decoder structure to obtain a semantic segmentation model that can encode multi-scale context information and capture clear target edges.RRU-Net is a ring residual network based on U-Net,which carries out feature reinforcement through the propagation and feedback process of residual in convolutional neural network(CNN),which makes the difference between tampered region and non-tampered region more obvious.MTSENet is a two-branch model,which realizes tamper detection by fusing the information features of the two branches.The quantitative evaluation metric is the F1 measure.F1 is a commonly used classification model evaluation index.In the Columbia dataset,the F1 values of the proposed BDA-Net and the top-ranked model differ by only 1.6%.In the NIST16 Splicing dataset,the F1 value of the proposed BDA-Net differs slightly from the F1 values of the best models.In difficult datasets,namely,the CASIA2.0 splicing dataset and the IMD2020 dataset,the F1 values of BDA-Net are 15.3% and 11.9% higher than those of the second-ranked model,respectively.Moreover,we apply five complex attack methods,namely,JPEG compression,Gaussian blur,sharpening,Gaussian noise and salt and pepper noise,to the image to verify the robustness of our proposed model.Experiments show that the robustness of our model is significantly better than that of the other models.Conclusion The image splicing detection method proposed in this study fully uses the advantages of the deep learning model and the expertise in the image forgery field,effectively improving the model's performance.The experimental results on four splicing datasets illustrate that our model has stronger detection capability and better stability than the existing splicing detection methods.
Keywords

订阅号|日报