高性能整数倍稀疏网络行为识别研究
臧影1,2,3, 刘天娇4, 赵曙光4, 杨东升1,2(1.中国科学院大学计算机科学与技术学院, 北京 100049;2.中国科学院沈阳计算技术研究所, 沈阳 110168;3.湖州师范学院信息工程学院, 湖州 313000;4.烟台创迹软件有限公司, 烟台 264000) 摘 要
目的 行为识别在人体交互、行为分析和监控等实际场景中具有广泛的应用。大部分基于骨架的行为识别方法利用空间和时间两个维度的信息才能获得好的效果。GCN (graph convolutional network)能够将空间和时间信息有效地结合起来,然而基于GCN的方法具有较高的计算复杂度,结合注意力模块和多流融合策略使整个训练过程具有更低的效率。目前大多数研究都专注于算法的性能,如何在保证精度的基础上减少算法的计算量是行为识别需要解决的关键性问题。对此,本文在轻量级Shift-GCN (shift graph convolutional network)的基础上,提出了整数倍稀疏网络IntSparse-GCN (integer sparse graph convolutional network)。方法 首先提出奇数列向上移动,偶数列向下移动,并将移出部分用0替代新的稀疏移位操作,并在此基础上,提出将网络每层的输入输出设置成关节点的整数倍,即整数倍稀疏网络IntSparse-GCN。然后对Shift-GCN中的mask掩膜函数进行研究分析,通过自动化遍历方式得到精度最高的优化参数。结果 消融实验表明,每次算法改进都能提高算法整体性能。在NTU RGB+D数据集的子集X-sub和X-view上,4流IntSparse-GCN+M-Sparse的Top-1精度分别为90.72%和96.57%。在Northwestern-UCLA数据集上,4流IntSparse-GCN+M-Sparse的Top-1精度达到96.77%,较原模型提高2.17%。相比代表性的其他算法,在不同数据集及4个流上的准确率均有提升,尤其在Northwestern-UCLA数据集上提升非常明显。结论 本文针对shift稀疏特征提出整数倍IntSparse-GCN网络,对Shift-GCN中的mask掩膜函数进行研究分析,并设计自动化遍历方式得到精度最高的优化参数,不但提高了精度,也为进一步的剪枝及量化提供了依据。
关键词
Action recognition analysis derived of integer sparse network
Zang Ying1,2,3, Liu Tianjiao4, Zhao Shuguang4, Yang Dongsheng1,2(1.School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China;2.Shenyang Institute of Computing Technology, Chinese Academy of Sciences, Shenyang 110168, China;3.School of Information Engineering, Huzhou University, Huzhou 313000, China;4.Trial Retail Engineering Co., Ltd., Yantai 264000, China) Abstract
Objective The task of action recognition is focused on multi-frame images analysis like the pose of the human body from a given sensor input or recognize the in situ action of the human body through the obtained images. Action recognition has a wide range of applications in ground truth scenarios, such as human interaction, action analysis and monitoring. Specifically, some illegal human behaviors monitoring in public sites related to bus interchange, railway stations and airports. At present, most of skeleton-based methods are required to use spatio-temporal information in order to obtain good results. Graph convolutional network (GCN) can combine space and time information effectively. However, GCN-based methods have high computational complexity. The integrated strategies of attention modules and multi-stream fusion will cause lower efficiency in the completed training process. The issue of algorithm cost as well as ensuring accuracy is essential to be dealt with in action recognition. Shift graph convolutional network (Shift-GCN) is applied shift to GCN effectively. Shift-GCN is composed of novel shift graph operations and lightweight point-wise convolutions, where the shift graph operations provide flexible receptive fields for spatio-temporal graphs. Our proposed Shift-GCN has its priority with more than 10×less computational complexity based on three datasets for skeleton-based action recognition However, the featured network is redundant and the internal structural design of the network has not yet optimized. Therefore, our research analysis optimizes it on the basis of lightweight Shift-GCN and finally gets our own integer sparse graph convolutional network (IntSparse-GCN). Method In order to effectively solve the feature redundancy problem of Shift-GCN, we proposes to move each layer of the network on the basis of the feature shift operation that the odd-numbered columns are moved up and the even-numbered columns are moved down and the removed part is replaced with 0. The input and output of is set to an integer multiple of the joint point. First, we adopt a basic network structure similar to the previous network parameters. In the process of designing the number of input and output channels, try to make the 0 in the characteristics of each joint point balanced and finally get the optimization network structure. This network makes the position of almost half of the feature channel 0, which can express features more accurately, making the feature matrix a sparse feature matrix with strong regularity. The network can improve the robustness of the model and the accuracy of recognition more effectively. Next, we analyzed the mask function in Shift-GCN. The results are illustrated that the learned network mask is distributed in a range centered on 0 and the learned weights will focus on few features. Most of features do not require mask intervention. Finally, our experiments found that more than 80% of the mask function is ineffective. Hence, we conducted a lot of experiments and found that the mask value in different intervals is set to 0. The influence is irregular, so we designed an automated traversal method to obtain the most accurate optimized parameters and then get the optimal network model. Not only improves the accuracy of the network, but also reduces the multiplication operation of the feature matrix and the mask vector. Result Our ablation experiment shows that each algorithm improvement can harness the ability of the overall algorithm. On the X-sub dataset, the Top-1 of 1 stream(s) IntSparse-GCN reached 87.98%, the Top-1 of 1 s IntSparse-GCN+M-Sparse reached 88.01%; the Top-1 of 2 stream(s) IntSparse-GCN reached 89.80%, and the Top-1 of 2 s IntSparse-GCN+M-Sparse's Top-1 reached 89.82%; 4 stream(s) IntSparse-GCN's Top-1 reached 90.72%, 4 s IntSparse-GCN+M-Sparse's Top-1 reached 90.72%., Our evaluation is carried out on the NTU RGB + D dataset, X-view's 1 s IntSparse-GCN+M-Sparse's Top-1 reached 94.89%, and 2 s IntSparse-GCN+M-Sparse's Top-1 reached 96.21%, and the Top-1 of 4 s IntSparse-GCN+M-Sparse reached 96.57% through the ablation experiment, the Top-1 of 1s IntSparse-GCN+M-Sparse reached 92.89%, the Top-1 of 2 s IntSparse-GCN+M-Sparse reached 95.26%, and the Top of 4 s IntSparse-GCN+M-Sparse-1 reached 96.77%, which is 2.17% higher than the original model through the Northwestern-UCLA dataset evaluation. Compared to other representative algorithms, the multiple data sets accuracy and 4 streams have been improved. Conclusion We first proposed a novel method called IntSparse-GCN. A spatial shift algorithm is introduced based on integer multiples of the channel. Such feature matrix is a sparse feature matrix with strong regularity. The matrix facilitates the possibility to optimize the model pruning. To obtain the most accurate optimization parameters, our research analyzed the mask function in Shift-GCN and designed an automated traversal method. Sparse feature matrix and the mask parameter have potential to pruning and quantification further.
Keywords
action recognition lightweight sparse feature matrix integer sparse graph convolutional network (IntSparse-GCN) mask function
|