Current Issue Cover
融合门控自注意力机制的生成对抗网络视频异常检测

刘成明, 薛然, 石磊, 李英豪, 高宇飞(郑州大学网络空间安全学院, 郑州 450002)

摘 要
目的 视频异常行为检测是当前智能监控技术的研究热点之一,在社会安防领域具有重要应用。如何通过有效地对视频空间维度信息和时间维度信息建模来提高异常检测的精度仍是目前研究的难点。由于结构优势,生成对抗网络目前广泛应用于视频异常检测任务。针对传统生成对抗网络时空特征利用率低和检测效果差等问题,本文提出一种融合门控自注意力机制的生成对抗网络进行视频异常行为检测。方法 在生成对抗网络的生成网络U-net部分引入门控自注意力机制,逐层对采样过程中的特征图进行权重分配,融合U-net网络和门控自注意力机制的性能优势,抑制输入视频帧中与异常检测任务不相关背景区域的特征表达,突出任务中不同目标对象的相关特征表达,更有效地针对时空维度信息进行建模。采用LiteFlownet网络对视频流中的运动信息进行提取,以保证视频序列之间的连续性。同时,加入强度损失函数、梯度损失函数和运动损失函数加强模型检测的稳定性,以实现对视频异常行为的检测。结果 在CUHK (Chinese University of Hong Kong) Avenue、UCSD (University of California,San Diego) Ped1和UCSD Ped2等视频异常事件数据集上进行实验。在CUHK Avenue数据集中,本文方法的AUC (area under curve)为87.2%,比同类方法高2.3%;在UCSD Ped1和UCSD Ped2数据集中,本文方法的AUC值均高于同类其他方法。同时,设计了4个消融实验并对实验结果进行对比分析,本文方法具有更高的AUC值。结论 实验结果表明,本文方法更适合视频异常检测任务,有效提高了异常行为检测任务模型的稳定性和准确率,且采用视频序列帧间运动信息能够显著提升异常行为检测性能。
关键词
The gating self-attention mechanism and GAN integrated video anomaly detection

Liu Chengming, Xue Ran, Shi Lei, Li Yinghao, Gao Yufei(School of Cyber Science and Engineering, Zhengzhou University, Zhengzhou 450002, China)

Abstract
Objective Video-based abnormal behavior detection has been developing based on the intelligent surveillance technology, and it has potentials in public security. However, the issue of video-based spatio-temporal information modeling is challenged for improving the accuracy of anomaly detection. Traditional video-based abnormal behavior detection methods are focused on manual-based features extraction, such as the clear contour, motion information and trajectory of the target. Such methods are constrained of weak representation in massive video data processing. Current deep learning model method can automatically learn and extract advanced features based on massive video stream datasets, which has been widely used in video anomaly detection methods instead of manual-based features. The structural priorities of generative adversarial network (GAN) have been widely used in video anomaly detection tasks. Aiming at the problems of low utilization rate of spatio-temporal features and poor detection effect of traditional GAN, we demonstrate a video anomaly detection algorithm based on the integration of GAN and gating self-attention mechanism. Method First, the gating self-attention mechanism is introduced into the U-net part of the generative network in the GAN, and the self-attention-mechanism-derived distributed weight of the feature maps is assigned layer by layer in the sampling process. The standard U-net network is linked to the features of the targets through the jump connection structure without effective features orientation. Our research is focused on combining the structural optimization of U-net network and gated self-attention mechanism, the feature representation of background regions irrelevant to anomaly detection tasks is suppressed in input video frames, the related feature expression of different targets is highlighted, and the spatio-temporal information is modeled more effectively. Next, to guarantee the consistency between video sequences, we adopt a smoother and faster LiteFlownet network to extract the motion information between video streams. Finally, to generate higher quality frames, the loss-related multi-functions of intensity, gradient and motion are added to enhance the stability of model detection. The adversarial network is trained by PatchGAN. GAN can achieve a good and stable performance after learning adversarial optimization. Result Our experiments are carried out on the datasets of recognized video abnormal event, such as Chinese University of Hong Kong(CUHK) Avenue, University of California, San Diego(UCSD) Ped1 and UCSD Ped2, and the featured area value under receiver operating curve (ROC), anomaly rule fraction S and peak signal-to-noise ratio (PSNR) are taken as performance evaluation indexes. For the CUHK Avenue dataset, our area under curve(AUC)reaches 87.2%, which is 2.3% higher than those similar methods. For both UCSD Ped1 and UCSD Ped2 datasets, the AUC-values are higher more. At the same time, four ablation experiments are implemented as mentioned below:1) the model 1 is applied to video anomaly detection tasks using standard U-net as the generative network; 2) the difference of model 2 is clarified that the gating self-attention mechanism is added to the generation network U-net to verify whether the mechanism is effective or not; 3) model 3 adds a gating self-attention mechanism to the generative network U-net, and the LiteFlownet is added to verify the effectiveness of the optical flow network; and 4) our model 4 is illustrated as well. For the generated network U-net and the gating self-attention mechanism, LiteFlownet is added and the gating self-attention mechanism is merged layer by layer at the coding end to perform feature weighting processing and the merged features are identified at the decoding end. Our method can obtain higher AUC values than the other three ablation model methods. We test the trained model and visualize the PSNR value of video sequence frames. The change of PSNR value shows the accuracy of the model for abnormal behavior detection. Conclusion The experimental results show that our method achieves better recognition results on CUHK Avenue, UCSD Ped1 and UCSD Ped2 datasets, which is more suitable for video anomaly detection tasks, and effectively improves the stability and accuracy of abnormal behavior detection task model. Moreover, the performance of abnormal behavior detection can be significantly improved via using video sequence interframe motion information.
Keywords

订阅号|日报