Indexed by:
Abstract:
As skeleton data becomes increasingly available, Graph Convolutional Networks (GCNs) are popularly adapted to extract the spatial and temporal features for skeleton-based action recognition. However, there are still limitations to be addressed in GCN-based methods. First, the multi-level semantic features fail to be connected, making fine-grained information loss as the network deepens. Second, the cross-scale spatiotempral features fail to be simultaneously considered and refined to focus on informative areas. These limitations lead to the challenge in distinguishing the confusing actions. To address these issues, we propose a cross-scale connection (CSC) structure and a spatiotemporal refinement focus (STRF) module. The CSC aims to bridge the gap between multi-level semantic features. The STRF module refines the cross-scale spatiotemporal features to focus on informative joints in each frame. Both are embedded into the standard GCNs to form the cross-scale spatiotemporal refinement network (CSR-Net). Our proposed CSR-Net explicitly models the cross-scale spatiotemporal information among multi-level semantic representations to boost the distinguishing capability for ambiguous actions. We conduct extensive experiments to demonstrate the effectiveness of our proposed method and it outperforms state-of-the-art methods on the NTU RGB+D 60, NTU-RGB+D 120 and NW-UCLA datasets.
Keyword:
Reprint Author's Address:
Email:
Source :
IEEE SIGNAL PROCESSING LETTERS
ISSN: 1070-9908
Year: 2024
Volume: 31
Page: 441-445
3 . 9 0 0
JCR@2022
Cited Count:
WoS CC Cited Count: 4
SCOPUS Cited Count: 6
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 2
Affiliated Colleges: