基于多质量因子压缩误差的对抗样本攻击方法识别
赵俊杰1, 王金伟2,3, 吴俊凤2(1.南京信息工程大学电子与信息工程学院, 南京 210044;2.南京信息工程大学计算机学院, 南京 210044;3.数字取证教育部工程研究中心, 南京 210044) 摘 要
目的 对抗样本严重干扰了深度神经网络的正常工作。现有的对抗样本检测方案虽然能准确区分正常样本与对抗样本,但是无法判断具体的对抗攻击方法。对此,提出一种基于多质量因子压缩误差的对抗样本攻击方法识别方案,利用对抗噪声对JPEG压缩的敏感性实现攻击方法的识别。方法 首先使用卷积层模拟JPEG压缩、解压缩过程中的颜色转换和空频域变换,实现JPEG误差在图形处理器(graphic processing unit,GPU)上的并行提取。提出多因子误差注意力机制,在计算多个质量因子压缩误差的同时,依据样本差异自适应调整各质量因子误差分支的权重。以特征统计层为基础提出注意力特征统计层。多因子误差分支的输出经融合卷积后,获取卷积层多维特征的同时计算特征权重,从而形成高并行对抗攻击方法识别模型。结果 本文以ImageNet图像分类数据集为基础,使用8种攻击方法生成了15个子数据集,攻击方法识别率在91%以上;在快速梯度符号法(fast gradient sign method,FGSM)和基本迭代法(basic iterative method,BIM)数据集上,噪声强度识别成功率超过96%;在对抗样本检测任务中,检测准确率达到96%。结论 所提出的多因子误差注意力模型综合利用了对抗噪声的分布差异及其对JPEG压缩的敏感性,不仅取得了优异的对抗攻击方法识别效果,而且对于对抗噪声强度识别、对抗样本检测等任务有着优越表现。
关键词
Adversarial attack method identification model based on multi-factor compression error
Zhao Junjie1, Wang Jinwei2,3, Wu Junfeng2(1.School of Electronics and Information Engineering, Nanjing University of Information Science and Technology, Nanjing 210044, China;2.School of Computer Science, Nanjing University of Information Science and Technology, Nanjing 210044, China;3.Engineering Research Center of Digital Forensics, Ministry of Education, Nanjing 210044, China) Abstract
Objective Artificial intelligence (AI) technique based deep neural networks (DNNs) have facilitated image classification and human-facial recognition intensively. However, recent studies have shown that DNNs is vulunerable to small changes for input images. However, DNN-misclassified is caused derived of injecting small adversarial noise into the input sample, such an artificially designed anomalous example, called an adversarial example. Recent detection of adversarial examples can be used to get higher accuracy. But, to determine the level of deep neural network security and implement targeted defense strategies, the classification of attack methods is required to be developed further. The adversarial examples mainly consist two categories: 1) white-box and 2) black-box attacks. A white-box attack is oriented for all information about the target neural network-prior. The attacker can obtain information about the gradient of the loss function of the example and query the output of the target neural network and other information. A black-box attack concerns that the attacker can query the input and output information of the target neural network only. The white-box attack method is mainly implemented by querying the gradient of the network. Black-box attacks are mainly divided into two approaches: 1) bounded query and 2) gradient estimation. It is still challenged for the adversarial attack method used by attackers although existing adversarial example detection schemes can distinguish adversarial examples from natural ones accurately. JPEG compression is a commonly used lossy compression method for images processing. Its compression and decompression process can be linked to errors in relevant to truncation, rounding, color space conversion, as well as quantization. To deal with the heterogeneity for compression, the quantization step uses different quantization tables and a large variation is produced in the magnitude of the error. Method To classify adversarial examples’ generation methods, we develop a multi-factor error attention model. To classify examples from multiple attack methods, the JPEG errors are injected into a neural network. To achieve parallel extraction of JPEG errors on graphic processing unit (GPU), JPEG compression and decompression processes are simulated in terms of DNN components. Multiple error branches are employed to alleviate multiple attempts of quality factors. A multi-factor error attention mechanism is proposed, which can balance the multisample-differentiated weights of each quality factor error branch. The feature statistical layer is used to calculate the high-dimensional feature vectors of the feature map. An attention mechanism is added to the feature statistical layer, and a attention-based feature statistical layer is proposed. The attention mechanism is beneficial for the feature values to adaptively modify the ratio between them. The peak-convolutional layer-derived feature map output is fed to the attention-based feature statistical layer for each channel. To optimize an efficient model for classifying adversarial examples’ generation methods, the output of the multi-factor error branches is fused and sent into convolutional layers, then input into the attention-related feature statistical layer. Result We develop 15 ImageNet image classification dataset-based sub-datasets in terms of 8 popular attack methods. The fast gradient sign method (FGSM) and basic iterative method (BIM)-generated adversarial examples are composed of 4 sub-datasets of perturbation coefficients of 2, 4, 6, and 8. The Bandits-based adversarial examples are organized by two sub-datasets of versions L2 and L∞. Each sub-dataset is involved of 10 000 training examples and 2 000 test examples. The overall dataset consists of 15 sub-datasets, the attack method recognition rate is above 91%. The accuracy of noise intensity detection is above 96% on the FGSM and BIM datasets. In the adversarial sample detection task, the detection accuracy reaches 96%. The experiments show that the multi-factor feature attention network can not only classify adversarial attack methods in high accuracy, but also has its potentials for noise intensity recognition and adversarial examples’ detection tasks. The comparative analysis demonstrate that our model proposed is not significantly degraded from existing schemes for the adversarial example detection task. Conclusion A multi-factor error attention model is developed for adversarial example classification. Our initial is dominated to the JPEG errors-aided for adversarial sample detection. The proposed model can simplify the extraction of JPEG compression-decompression errors and puts them on the GPU for parallel extraction. The error branch attention mechanism can be used to balance the weights adaptively between the error branches. The attention-linked feature statistical layer enriches the feature types and balances them adaptively.
Keywords
image processing convolutional neural network(CNN) adversarial example image classification compression error
|