Cross-Scale Spatiotemporal Refinement Learning for Skeleton-Based Action Recognition - Details

Author：

Zhang, Yu (Zhang, Yu.) | Sun, Zhonghua (Sun, Zhonghua.) | Dai, Meng (Dai, Meng.) | Feng, Jinchao (Feng, Jinchao.) | Jia, Kebin (Jia, Kebin.) (Scholars：贾克斌)

Indexed by：

EI Scopus SCIE

Abstract：

As　skeleton　data　becomes　increasingly　available,　Graph　Convolutional　Networks　(GCNs)　are　popularly　adapted　to　extract　the　spatial　and　temporal　features　for　skeleton-based　action　recognition.　However,　there　are　still　limitations　to　be　addressed　in　GCN-based　methods.　First,　the　multi-level　semantic　features　fail　to　be　connected,　making　fine-grained　information　loss　as　the　network　deepens.　Second,　the　cross-scale　spatiotempral　features　fail　to　be　simultaneously　considered　and　refined　to　focus　on　informative　areas.　These　limitations　lead　to　the　challenge　in　distinguishing　the　confusing　actions.　To　address　these　issues,　we　propose　a　cross-scale　connection　(CSC)　structure　and　a　spatiotemporal　refinement　focus　(STRF)　module.　The　CSC　aims　to　bridge　the　gap　between　multi-level　semantic　features.　The　STRF　module　refines　the　cross-scale　spatiotemporal　features　to　focus　on　informative　joints　in　each　frame.　Both　are　embedded　into　the　standard　GCNs　to　form　the　cross-scale　spatiotemporal　refinement　network　(CSR-Net).　Our　proposed　CSR-Net　explicitly　models　the　cross-scale　spatiotemporal　information　among　multi-level　semantic　representations　to　boost　the　distinguishing　capability　for　ambiguous　actions.　We　conduct　extensive　experiments　to　demonstrate　the　effectiveness　of　our　proposed　method　and　it　outperforms　state-of-the-art　methods　on　the　NTU　RGB+D　60,　NTU-RGB+D　120　and　NW-UCLA　datasets.

Keyword：

Skeleton-based action recognition cross-scale fusion graph convolutional network

Author Community：

[ 1 ] [Zhang, Yu]Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
[ 2 ] [Sun, Zhonghua]Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
[ 3 ] [Dai, Meng]Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
[ 4 ] [Feng, Jinchao]Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
[ 5 ] [Jia, Kebin]Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China

Reprint Author's Address：

[Sun, Zhonghua]Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China;;

Email：

zhangyubjut@emails.bjut.edu.cn |
sunzh@bjut.edu.cn |
mengdai2021@163.com |
fengjc@bjut.edu.cn |
kebinj@bjut.edu.cn

Show more details

Related Keywords：

Combining channel-wise joint attention and temporal attention in graph convolutional networks for skeleton-based action recognition
2022，SIGNAL IMAGE AND VIDEO PROCESSING
Multi-View Time-Series Hypergraph Neural Network for Action Recognition
2024，IEEE TRANSACTIONS ON IMAGE PROCESSING
BIT-WOW at NLPCC-2022 Task5 Track1: Hierarchical Multi-label Classification via Label-Aware Graph Convolutional Network
2022，NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2022, PT II
RWGCN: Random walk graph convolutional network for group activity recognition
2025，APPLIED INTELLIGENCE

Source ：

IEEE SIGNAL PROCESSING LETTERS

ISSN： 1070-9908

Year： 2024

Volume： 31

Page： 441-445

3 . 9 0 0

JCR@2022

Cited Count：

WoS CC Cited Count： 4

SCOPUS Cited Count： 6

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 2

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search Web of Science

Type
Departments

All Years Choose Year From to