Current Issue Cover
利用运动向量差值改善深度学习视频隐写分析

胡永健1,2, 黄雄波1, 王宇飞2, 刘琲贝1, 刘烁炜3(1.华南理工大学电子与信息学院, 广州 510640;2.中新国际联合研究院, 广州 510700;3.国防科技大学电子科学学院, 长沙 410073)

摘 要
目的 针对现有深度学习视频隐写分析网络准确率不够高的问题,本文从视频压缩编码的原理出发,发掘嵌密编码参数与其他参数之间的关系,通过拓展检测空间,构造新的检测通道,改善现有深度学习视频隐写分析网络的检测性能。方法 以H.265/HEVC(high efficiency video coding)压缩视频为例,首先通过分析运动向量的嵌密修改对运动向量差值的影响,指出可将运动向量差值作为新增的采样对象(或称检测对象);接着,提出一个构造运动向量差值检测矩阵的方法,解决了空域上采样样本稀疏、时域上样本空间位置无法对齐的问题;最后,将运动向量差值矩阵直接用于改善现有的VSRNet(video steganalysis residual network)、SCA-VSRNet(selection-channel-aware VSRNet)以及Q-VSRNet(quantitative VSRNet)等3个H.265/HEVC深度学习视频隐写分析网络,分别得到IVSRNet(improved VSRNet)、SCA-IVSRNet(selection-channel-aware improved VSRNet)以及Q-IVSRNet(quantitative improved VSRNet)。结果 在5种隐写方法上进行了测试。与4种隐写分析方法进行了比较,包括移植到H.265/HEVC视频的经典手工特征视频隐写分析方法AoSO(adding or subtracting one)、MVRB(motion vector reversion-based)、NPEFLO(near-perfect estimation for local optimality)以及直接针对H.265/HEVC视频的新型隐写分析方法LOCL(local optimality in candidate list)。在定性隐写分析测试中,以0.2 bpmv嵌入率为例,在不同码率下,IVSRNet和SCA-IVSRNet的准确率分别全面超越了VSRNet和SCA-VRSNet;SCA-IVSRNet的准确率不全面超越AoSO和MVRB,且在部分情况下好于较新的LOCL方法。在定量隐写分析的测试中,Q-IVSRNet对于6种不同嵌入率样本的检测性能全面超越Q-VSRNet。结论 本文提出的拓展检测空间改进策略原理清晰,构造输入矩阵的方法简便、普适性好,能方便地拓展到其他深度学习视频隐写分析网络中,为设计更有效的视频隐写分析网络指明了一条道路。
关键词
Improving deep learning-based video steganalysis with motion vector differences

Hu Yongjian1,2, Huang Xiongbo1, Wang Yufei2, Liu Beibei1, Liu Shuowei3(1.School of Electronic and Information Engineering, South China University of Technology, Guangzhou 510640, China;2.China-Singapore International Joint Research Institute, Guangzhou 510700, China;3.College of Electronic Science, National University of Defense Technology, Changsha 410073, China)

Abstract
Objective The subjects of video steganography and video steganalysis have been widely studied because video is an ideal cover media for achieving high embedding capacity. The booming deep learning technique has been recently introduced to the area of video steganalysis. A few video steganalysis deep neural networks were published to detect the secret embedding in motion vectors (MVs). However, the current deep neural networks (DNNs) for video steganalysis only report mediocre detection accuracies, compared to the traditional handcrafted feature-based steganalysis approaches. It is conjectured that the performance limitation is due to the inadequate information provided for the network. According to the principle of video encoding, we explore the impact of steganographic embedding on different encoding parameters. Our aim is to extend the detection space by searching for abnormalities in coding parameters raised from steganography, so that we construct multiple input channels to improve detection performance of steganalysis networks. Method We first analyze how the motion vector differences (MVDs) can be influenced by the secret embedding on motion vectors (MVs). It is shown that the histogram of MVDs can exhibit visible changes in bin height after the embedding process of MVs. The MVDs convey critical information for revealing MV alteration, so we propose to consider the MVDs as an extra sampling space of the videos steganalysis network in addition to the existing MV and prediction residual spaces. However, the MVDs are irregularly and sparsely distributed in individual frames and are therefore difficult to calibrate among consecutive frames. We deliberately design a method for constructing the input channels of MVD samples, which can be compatible with the existing network architecture. Specifically, two matrices are adopted to record the vertical and horizontal components of MVD. Since the prediction unit (PU) partition varies from frame to frame, we take the minimum 4×4 block as the basic sampling unit. The vertical and horizontal components of the MVD of each 4×4 block are recorded as one element in vertical MVD matrix and horizontal MVD matrix, respectively. For H.265/HEVC (high efficiency video coding) video format, there are some blocks that do not involve inter-frame prediction and thus have no MVs and MVDs. There are also some blocks that use inter-frame prediction but adopt the Merge and Skip modes instead, and therefore only have MVs but no MVDs. For these two types of blocks, the corresponding elements are set to zeros in the MVD matrices. The newly introduced MVD channels can work alone or together with other channels such as MVs and prediction residuals. By incorporating the MVD channels into current video steganalysis networks, we obtain the improved networks for various tasks, including the improved VSRNet (IVSRNet), selection-channel-aware improved VSRNet (SCA-IVSRNet) and quantitative improved VSRNet (Q-IVSRNet). Result We conduct extensive experiments against 5 target steganographic methods with varying resolutions, bit rates and embedding rates. All embedding and detection are operated on H.265/HEVC videos. Two of the classical target methods originally designed for H.264 videos are transplanted to H.265/HEVC videos. The rest three targets are recently published H.265/HEVC specific steganographic methods. We first evaluate the performance of the MVD-VSRNet that only uses the MVD and prediction residual channels without the MV channels. Increased accuracies are obtained from the MVD-VSRNet compared to the baseline network VSRNet that employs MV and prediction residual channels. The discriminating capability of MVDs for stego videos is thus verified. The IVSRNet, adopting the MV, prediction residual and MVD channels, achieves an even better result. We then evaluate the SCA-IVSRNet, which integrates the IVSRNet with an embedding probability channel. It is shown that the performance of the SCA-IVSRNet exceeds both the IVSRNet and the SCA-VSRNet. We conduct comparisons with several milestone handcrafted feature-based video steganalysis approaches for MV-based steganography, including the adding or subtracting one (AoSO), motion vector reversion-based (MVRB) and near-perfect estimation for local optimality (NPEFLO) algorithms. We also include the local optimality in candidate list (LOCL), the latest state-of-the-art (SOTA) steganalysis method that employs specific feature of H.265/HEVC standard. It is shown that the SCA-IVSRNet surpasses all the other methods against the two transplanted target steganography. As for the H.265/HEVC specific steganography, the SCA-IVSRNet loses marginally to the NPEFLO and LOCL methods by less than 2% but exceeds the rest methods by around 10%. Among the five targets, the most challenging one does not directly change the MV values. In this case, the SCA-IVSRNet reports accuracies around 67%, only 0.3% behind the first place LOCL. It is worth noting that the IVSRNet also reaches 63% in this case, verifying again the important role of the proposed MVD channels. Finally, we assess the performance of the Q-IVSRNet on quantitative steganalysis task. The mean absolute errors (MAEs) obtained with the Q-IVSRNet are consistently less than those with the Q-VSRNet, which can be attributed to the effectiveness of MVD channels. Conclusion In this work we aim at improving the detection accuracy of convolutional neural network (CNN)-based steganalyzers for MV-based video steganography. We point out the current input spaces of MVs and prediction residuals do not convey adequate steganalytic information. To solve this problem, we propose to extend the detection space to MVDs. The newly introduced MVD channel is fully compatible with current CNN-based video steganalyzers, leading to several improved steganalysis networks. Extensive experiments are conducted to evaluate the effectiveness of adopting MVD channels. Results show that the improved detection networks not only surpass their precedent versions by a large margin, but also catch up or even exceed some popular handcrafted feature-based steganalyzers. This work has exhibited how to extend the detection space and handle highly unstructured data in the construction of input matrix for CNN-based video steganalysis, which paves a way of designing more effective deep learning networks for video steganalysis.
Keywords

订阅号|日报