结合扰动约束的低感知性对抗样本生成方法
王杨1, 曹铁勇1, 杨吉斌1, 郑云飞1,2,3, 方正1, 邓小桐1(1.陆军工程大学指挥控制工程学院, 南京 210007;2.陆军炮兵防空兵学院南京校区火力系, 南京 211100;3.安徽省偏振成像与探测重点实验室, 合肥 230031) 摘 要
目的 对抗样本是指在原始数据中添加细微干扰使深度模型输出错误结果的合成数据。视觉感知性和攻击成功率是评价对抗样本的两个关键指标。当前大多数对抗样本研究侧重于提升算法的攻击成功率,对视觉感知性的关注较少。为此,本文提出了一种低感知性对抗样本生成算法,构造的对抗样本在保证较高攻击成功率的情况下具有更低的视觉感知性。方法 提出在黑盒条件下通过约束对抗扰动的面积与空间分布以降低对抗样本视觉感知性的方法。利用卷积网络提取图像中对输出结果影响较大的关键区域作为约束,限定扰动的位置。之后结合带有自注意力机制的生成对抗网络在关键区域添加扰动,最终生成具有低感知性的对抗样本。结果 在3种公开分类数据集上与多种典型攻击方法进行比较,包括7种白盒算法FGSM (fast gradient sign method)、BIM (basic iterative method)、DeepFool、PerC-C&W (perceptual color distance C&W)、JSMA (Jacobian-based saliency map attacks)、APGD (auto projected gradient descent)、AutoAttack和2种黑盒算法OnePixel、AdvGAN (adversarial generative adversarial network)。在攻击成功率(attack success rate,ASR)上,本文算法与对比算法处于同一水平。在客观视觉感知性对比中,本文算法较AdvGAN在低分辨率数据集上,均方误差(mean square error,MSE)值降低了42.1%,结构相似性值(structural similarity,SSIM)提升了8.4%;在中高分辨率数据集上,MSE值降低了72.7%,SSIM值提升了12.8%。与视觉感知性最好的对比算法DeepFool相比,在低分辨率数据集上,本文算法的MSE值降低了29.3%,SSIM值提升了0.8%。结论 本文分析了当前算法在视觉感知性上存在的问题,提出了一种对抗样本生成方法,在攻击成功率近似的情况下显著降低了对抗样本的视觉感知性。
关键词
A perturbation constraint related weak perceptual adversarial example generation method
Wang Yang1, Cao Tieyong1, Yang Jibin1, Zheng Yunfei1,2,3, Fang Zheng1, Deng Xiaotong1(1.Institute of Command-and-Control Engineering, Army Engineering University of PLA, Nanjing 210007, China;2.Firepower Department, The Army Artillery and Defense Academy of PLA, Nanjing 211100, China;3.The Key Laboratory of Polarization Imaging Detection Technology of Anhui Province, Hefei 230031, China) Abstract
Objective The adversarial example is a sort of deep neural model data that may lead to output error in relevant to added-perturbation for original image. Perturbation is one of the key factors in the process of adversarial example generation, which yields the model to generate output error with no distortion of original image or human vision perception. Based on the analysis mentioned above, the weak perception of vision and the attack success rate can be as the two essential factors to evaluate the adversarial example. The objective evaluation criteria of current algorithms for visual imperceptibility are relatively consistent:the three channels RGB images may generate better visual imperceptibility as the lower pixel value decreased. The objective evaluation criteria can just resist the range of the perturbation. But, the affected area and perturbation distribution is required to be involved in. Our method aims to illustrate an algorithm to enhance the weak perceptibility of the adversarial examples via the targeted area constraint and the perturbation distribution. Our algorithm design is carried out on the aspects as mentioned below:1) the perturbation should be distributed in the same semantic region of the image as far as possible like the target area or background; 2) the distribution of the perturbation is necessary to be consistent with the image structure as much as possible; 3) the generation of invalid perturbation is required to reduce as much as possible. Method We demonstrate an algorithm to weaken the visual perceptibility of the adversarial paradigms via constrained area and distribution of the black-box conditioned perturbation, which is segmented into two steps:first, the critical regions of image are extracted by convolution network with attention mechanism. The critical region refers to the area that has great influence on the output of the model. The possibility of output error could be increased if the perturbation is melted. If the critical region meets the ideal value, adding perturbation to the region would result the output error of classification model. In order to train the convolution network used to extract the critical region, Gaussian noise is taken as the perturbation in first step, and the perturbation value is fixed on. The first perturbation step is added to the extracted critical area to generate the adversarial example. Then, the adversarial examples are transmitted to the discriminator and the classification model to be attacked each and obtain the loss calculation. In the second step, the weights of the extraction network are identified. The images are fed into the generator with self-attention mechanism and the extraction network to generate perturbation and the critical regions. The perturbation is multiplied by the critical region and melted with the image to generate adversarial examples. The losses are calculated while the generator is optimized after the adversarial examples are fed into the discriminator and the classification model that would be attacked. Moreover, the performance of the second steps perturbation should be better than or equal to the Gaussian noise used in first step, which sets a lower constraint for the success rate of the second step. In the first step of training, we would calculate the perception loss between the original image and the critical regions based on convolution network extraction. Global perception loss was first used in the image style transfer task to maintain the image structure information for the task overall, which can keep the consistency between the perturbation and the image structures to lower the visual perceptibility of the adversarial example. Result We compared our algorithm to 9 existed algorithms, including white-box algorithm and black-box algorithm based on three public datasets. The quantitative evaluation metrics contained the structure similarity (SSIM, higher is better), the mean square error (MSE, less is better) and the attack success rate (ASR, higher is better). MSE is used to measure the intensity of perturbation, and SSIM evaluates the influence of perturbation on the image on the aspects of structured information. We also facilitate several adversarial examples generated by difference algorithms to compare the qualitative perceptibility. Our experiment illustrates that the attack success rate of the proposed method is similar to that of the existing methods on three consensus networks. The difference is less than 3% on the low-resolution dataset like CIFAR-10, and on the medium and high-resolution datasets like Tiny-ImageNet and ImageNet is less than 0.5%. Compared to fast gradient sign method(FGSM), basic iterative method(BIM), DeepFool, perceptual color distance C&W(PerC-C&W), auto projected gradient descent(APGD), AutoAttack and AdvGAN, our CIFAR-10 based MSE is lower by 45.1%, 34.91%, 29.3%, 75.6%, 69.0%, 53.9% and 42.1%, respectively, and SSIM is higher by 11.7%, 8%, 0.8%, 18.6%, 7.73%, 4.56%, 8.4%, respectively. Compared to FGSM, BIM, PerC-C&W, APGD, AutoAttack and AdvGAN, the Tiny-ImageNet based MSE is lower by 69.7%, 63.8%, 71.6%, 82.21%, 79.09% and 72.7%, respectively, and SSIM is higher by 10.1%, 8.5%, 38.1%, 5.08%, 1.12% and 12.8%, respectively. Conclusion Our analysis is focused on the existing issues in the evaluation of the perceptibility of the current methods, and proposes a method to enhance the visual imperceptibility of the adversarial examples. The three datasets based results indicate that the attack success rate of our algorithm has its priorities of better visual imperceptibility in terms of qualitative and quantitative evaluation.
Keywords
adversarial examples visual perceptibility adversarial perturbation generative adversarial network (GAN) black-box attack
|