结合金字塔Transformer与浅层CNN的变电站图像篡改检测

邢建好; 田秀霞; 韩奕

发布时间： 2024-02-06
摘要点击次数： 751
全文下载次数： 793
DOI: 10.11834/jig.230202
2024 | Volume 29 | Number 2

结合金字塔Transformer与浅层CNN的变电站图像篡改检测

邢建好¹, 田秀霞¹, 韩奕²(1.上海电力大学计算机科学与技术学院, 上海 201306;2.上海电力大学电子与信息工程学院, 上海 201306)

摘要

目的变电站图像拼接篡改是电力系统的一大安全隐患,针对篡改图像背景复杂、篡改内容尺度不一造成的误检漏检问题以及相关研究较少,本文提出一种面向变电站的拼接篡改图像的双通道检测模型。方法两通道均采用深度学习方法自适应提取篡改图像和残差图像的特征,其中篡改图像包含丰富的色彩特征和内容信息,残差图像重点凸显了篡改区域的边缘,有效应对了篡改图像多样性导致的篡改特征提取困难问题;将特征金字塔结构Transformer通道作为网络主分支,通过全局交互机制获取图像全局信息,建立关键点之间的联系,使模型具备良好的泛化性和多尺度特征处理能力;引入浅层卷积神经网络(convolutional neural network, CNN)通道作为辅助分支,着重提取篡改区域的边缘特征,使模型在整体轮廓上更容易定位篡改区域。结果实验在自制变电站拼接篡改数据集(self-made substation splicing tampered dataset, SSSTD)、CASIA(Chinese Academy of Sciences Institute of Automation dataset)和NIST16(National Institute of Standards and Technology 16)上与4种同类型方法进行比较。定量上看,在SSSTD数据集中,本文模型相对性能第2的模型在精确率、召回率、F1和平均精度上分别提高了0.12%、2.17%、1.24%和7.71%;在CASIA和NIST16数据集中,本文模型也取得了最好成绩。定性上看,所提模型减少了误检和漏检,同时定位精度更高。结论本文提出的双通道拼接篡改检测模型结合了Transformer和CNN在图像篡改检测方面的优势,提高了模型的检测精度,适用于复杂变电站场景下的篡改目标检测。

关键词

变电站图像拼接篡改检测 Transformer 卷积神经网络(CNN) 双通道网络特征金字塔结构浅层网络

Pyramid Transformer combined with shallow CNN for substation image tampering detection

Xing Jianhao¹, Tian Xiuxia¹, Han Yi²(1.College of Computer Science and Technology, Shanghai University of Electric Power, Shanghai 201306, China;2.College of Electronics and Information Engineering, Shanghai University of Electric Power, Shanghai 201306, China)

Abstract

Objective Image information becomes particularly important with the widespread application of intelligent power inspection.However,the rapid development of image tampering technology provides unscrupulous elements with a new way to harm power systems.As an important component of power systems,substations are responsible for the interconversion of different voltage levels.Ensuring the full-time output of stable voltage and the reasonable use of substation resources is the basis for the safe and stable operation of an entire power network.However,if the collected substation images are maliciously tampered with,then this condition may not only cause the failure of a smart grid system but also make operators misjudge the actual situation of the substation,eventually leading to power system failure and may even cause major accidents,such as large-scale power outages,resulting in irreversible losses to national production.Therefore,detecting tampered images of substations is a key task in ensuring the stability of power systems.The complex background of tampered images and the different scales of tampered contents cause existing detection models to experience the problems of false detection and leakage detection.Meanwhile,related research on image splicing tampering detection in power scenes is lacking.Accordingly,this study proposes a dual-channel detection model for splicing tampered images in substation scenes.Method The model consists of three parts:a Transformer channel with a feature pyramid structure,a shallow convolutional neural network(CNN) channel,and a network head.The size of the input tampered image is 512 ×512 × 3,and the output is the detection and localization results of the tampered image.Both channels use deep learning methods to extract features of the original color image and the residual image adaptively.The original color image contains rich color features and content information,while the residual image focuses on highlighting the edges of the tampered region,effectively solving the problem of difficult extraction of tampered features caused by the diversity of tampered images.In this study,the feature pyramid structure Transformer channel is used as the primary feature extraction channel,which consists of the pyramid structure Transformer and a progressive local decoder(PLD).The Transformer can efficiently extract features and establish connections between feature points via global attention from the first layer of the model in the global sensory field.Meanwhile,the use of the pyramid structure provides the network with better generalization and multi-scale feature processing capability.PLD enables features with different depths and expressiveness to guide and fuse with one another,solving the problems of attention scattering and the underestimation of local features to improve detail processing capability.The shallow CNN channel is used as an auxiliary detection channel,while the shallow network is used to extract the edge features of the tampered region in the residual image,enabling the model to locate the tampered region more easily in the overall contour.The residual block is the residual network module that forms the backbone of the shallow network.Its input is the residual image generated from the tampered image through the high-pass filtering layer.The parallel axial attention block introduces different sizes of dilated convolution to increase the perceptual field of the shallow network,and the parallel axial attention mechanism helps the network extract contextual semantic information.The features of two tributaries are fused into the network head by the channel,and the experiments conducted in this study show that merging by the channel is more effective than accumulation by elements.Finally,the network head detects the presence or absence of tampered regions in the image and accurately locates them.Result The experiments are first conducted on the pretraining datasets and pretraining weights are obtained.The test results show that the model in this study exhibits good detection effect on various tampering targets.The model is fine-tuned on the basis of the pretraining weights and compared with four models of the same type on the self-made substation splicing tampered dataset(SSSTD),CASIA,and NIST16.Four evaluation metrics,namely,accuracy,recall,F1,and average accuracy,are selected for quantitative analysis.In SSSTD,the accuracy,recall,F1,and average precision indexes of this study's model improved by 0.12%,2.17%,1.24%,and 7.71%,respectively,compared with the model with the 2nd highest performance.In CASIA,this study's model still achieves the best results in the four evaluation indexes.In NIST16,various detection models achieve higher values in accuracy,and this study's model achieves higher values in recall rate.F1 and average precision indexes are substantially improved compared with the four comparison models.Qualitatively,the proposed model mitigates the problems of false detection and missed detection,while achieving higher localization accuracy.The overall detection effect is better than the other models.Conclusion The detection of tampered substation image splicing is a key task in ensuring the stability of a power system.This study designs a new complex substation image splicing tampering detection model based on a feature pyramid structure Transformer and a shallow CNN dual channels.The feature pyramid structure Transformer channel obtains rich semantic information and visual features of tampered images through the global interaction mechanism,enhancing the accuracy and multi-scale processing capability of the detection model.As an auxiliary channel,the shallow CNN focuses on extracting residual image edge features,making it easier for the model to locate tampered regions in the overall contour.The models are measured on different splicing tampering datasets,and all the models in this study achieve optimal results.The visualization further shows that the model in this study exhibits the best detection effect in the actual substation scenario.However,this work only investigates image splicing tampering detection,while diverse types of tampering occur in reality.The next step is to investigate other types of tampered image detection to improve the compatibility of tampering detection models.

Keywords

substation image splicing tampering detection Transformer convolutional neural network(CNN) dualchannel network feature pyramid structure shallow network

在线采编平台

论文出版

年度会议

下载中心

年度信息