Current Issue Cover
  • 发布时间: 2025-02-13
  • 摘要点击次数:  0
  • 全文下载次数: 0
  • DOI:
  •  | Volume  | Number
基于动态多粒度图卷网络的骨架行为识别方法研究

吴志泽1, 陈鑫1, 徐童2, 李腾3, 年福东1, 王晓峰1(1.合肥大学;2.中国科学技术大学;3.安徽大学)

摘 要
目的 基于图卷积网络的方法在人体骨架行为识别任务中越来越受欢迎,并取得了显著的进展。传统图卷积在远距离节点信息交互方面的局限,导致在捕获骨架中非自然连接节点信息时表现不佳,同时现有致力于复杂空间建模方法,也面临着特征冗余和参数量显著增加的问题。方法 为此,提出了一种基于动态多粒度图卷积网络的人体骨架行为识别方法,根据人体关节点的不同组合方式重构骨架图,设计三种不同粒度的图结构,从而更好地捕获骨架图中的非自然连接节点信息。为了应对特征冗余和参数量增大的难题,引入了空间重组卷积模块,该模块通过分离-重建操作将信息丰富与匮乏的特征进行交叉重构,有效减少了空间维度特征的冗余。在特征融合阶段,根据三种粒度的图结构引出了全新的六流融合方式,利用它们的互补信息以提高模型的整体性能。 结果 与基线方法CTR-GCN相比,所提出的方法在基准数据集NTU-RGB+D、NTU-RGB+D 120、Northwestern-UCLA上分别得到了0.6%、0.7%、0.7%的提升。结论 动态多粒度图卷积网络结合多粒度图结构和空间通道重组卷积,是一种新的时空建模方法,通过扩大图卷积网络的感受野并显著减少时空建模过程中的特征冗余,提高了捕捉复杂人体动作的能力和准确性。
关键词
Dynamic multi-granularity graph convolutional networks for skeleton-based action recognitionZhize Wu1,2,Xin Chen1,Tong Xu3,Le Zou1,PengPeng Sun1,XiaoFeng Wang1

(1.University of Science and Technology of China;2.Hefei University)

Abstract
Objective In recent years, methods based on Graph Convolutional Networks (GCN) have become increasingly popular in human skeleton based action recognition, making significant strides in this challenging domain. These advances are primarily attributed to the ability of GCNs to model the spatial and temporal dependencies inherent in human skeletal data. However, traditional graph convolutions exhibit notable limitations, particularly in capturing interaction information between distant nodes. This shortcoming leads to suboptimal performance in recognizing non-natural connections within the skeleton graph, a crucial aspect for accurately modeling complex human actions. Traditional GCNs are adept at processing locally connected nodes, but their efficacy diminishes as the distance between nodes increases. This is problematic in the context of human skeletons, where actions often involve coordinated movements of body parts that are not directly connected. For instance, actions involving simultaneous hand and foot movements necessitate an understanding of long-range dependencies. The inability of conventional GCNs to effectively capture these dependencies results in a limited understanding of the overall action, thereby reducing recognition accuracy. Moreover, existing approaches that attempt to model complex spatial relationships often encounter significant issues related to feature redundancy and an exponential increase in parameter count. These methods, while sophisticated, tend to generate a large number of redundant features, which not only increases computational complexity but also hampers the overall efficiency of the model. Method To address these challenges, a novel multi-granularity graph structure called the Dynamic Multi-Granularity Graph Convolutional Network (DMG-GCN) is proposed to reconstruct the skeleton graph. This approach involves designing three different granularity graph structures, each tailored to capture distinct aspects of the skeletal data. By combining various human body joint points in innovative ways, these multi-granularity graphs enable the model to capture interaction information between non-naturally connected nodes more effectively. This hierarchical representation allows for a more nuanced understanding of the spatial relationships within the skeleton graph. Building on the multi-granularity graph structure, a dynamic adjacency matrix is introduced in spatial modeling. Unlike static adjacency matrices, which remain fixed regardless of the specific action being performed, the dynamic adjacency matrix adapts based on the current spatial configuration of the nodes. This adaptability ensures a more accurate representation of the semantic relationships between nodes, leading to improved recognition performance. In addition to the dynamic adjacency matrix, a spatial reorganization convolution module is proposed to mitigate the issues of feature redundancy and the growing parameter volume. This module operates by cross-reconstructing information-rich and information-poor features through separation-reconstruction operations. By effectively distinguishing and reorganizing these features, the module reduces spatial dimension feature redundancy, thereby enhancing the model’s efficiency and performance. During the feature fusion stage, a new six-stream fusion method is introduced, leveraging the complementary information derived from the three-granularity graph structures. This method integrates the diverse insights provided by each granularity level, leading to a more comprehensive understanding of the skeletal data. The integration of these streams ensures that the model captures the full spectrum of spatial and temporal dependencies, significantly improving overall performance. Result The efficacy of the proposed approach is evidenced by its performance on benchmark datasets. Compared to the baseline method CTR-GCN, the proposed method achieves improvements of 0.6%, 0.7%, and 0.7% on the NTU-RGB+D, NTU-RGB+D 120, and Northwestern-UCLA datasets, respectively. These improvements, while seemingly modest, represent significant advancements in the highly competitive field of human skeleton action recognition. The ablation studies further validate the effectiveness of the multi-granularity graph structure and spatial channel reconstruction convolution within the proposed architecture. These studies highlight the individual contributions of each component, demonstrating how the multi-granularity approach enhances the model’s ability to capture complex interactions and how the spatial reorganization convolution reduces redundancy and improves efficiency. Additionally, comparative visualizations underscore the superiority of the dynamic adjacency matrix over conventional adjacency matrices. These visualizations reveal how the dynamic matrix more effectively captures semantically informative connections between nodes, thereby facilitating a deeper understanding of complex actions. Conclusion Our DMG-GCN represents a significant advancement in spatio-temporal modeling for human skeleton action recognition. By integrating a multi-granularity graph structure with spatial channel reconstruction convolution, this approach expands the receptive field of GCNs and substantially reduces feature redundancy. The dynamic adjacency matrix further enhances the model’s capability to capture intricate semantic relationships, leading to more accurate and nuanced action recognition. The proposed DMG-GCN not only addresses the limitations of traditional GCNs but also sets a new benchmark for future research in the field. Its innovative approach to handling long-distance node interactions and reducing feature redundancy paves the way for more sophisticated and efficient models. As human skeleton action recognition continues to evolve, the principles and techniques introduced by DMG-GCN are likely to inspire further advancements, driving the field towards even greater accuracy and applicability in real-world scenarios.
Keywords

订阅号|日报