• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Zhang, C. (Zhang, C..) | Ren, K. (Ren, K..) | Bian, Q. (Bian, Q..) | Shi, Y. (Shi, Y..)

Indexed by:

Scopus

Abstract:

This paper focuses on how to improve the efficiency of the action recognition framework by optimizing its complicated feature extraction pipelines and enhancing explainability, benefiting future adaptation to more complex visual understanding tasks (e.g. video captioning). To achieve this task, we propose a novel decoupled two-stream framework for action recognition - HSAR, which utilizes high-semantic features for increased efficiency and provides well-founded explanations in terms of spatial-temporal perceptions that will benefit further expansions on visual understanding tasks. The inputs are decoupled into spatial and temporal streams with designated encoders aiming to extract only the pinnacle of representations, gaining high-semantic features while reducing computation costs greatly. A lightweight Temporal Motion Transformer (TMT) module is proposed for globally modeling temporal features through self-attention, omitting redundant spatial features. Decoupled spatial-temporal embeddings are further merged dynamically by an attention fusion model to form a joint high-semantic representation. The visualization of the attention in each module offers intuitive interpretations of HSAR's explainability. Extensive experiments on three widely-used benchmarks (Kinetics400, 600, and Sthv2) show that our framework achieves high prediction accuracy with significantly reduced computation (only 64.07 GFLOPs per clip), offering a great trade-off between accuracy and computational costs.  © 2023 ACM.

Keyword:

High-Semantics Action Recognition Computer Vision Explainable AI Machine Learning Decoupled Feature Extraction

Author Community:

  • [ 1 ] [Zhang C.]Faculty of Information Technology, Beijing University of Technology, Beijing, China
  • [ 2 ] [Ren K.]Faculty of Information Technology, Beijing University of Technology, Beijing, China
  • [ 3 ] [Bian Q.]Faculty of Information Technology, Beijing University of Technology, Beijing, China
  • [ 4 ] [Shi Y.]Faculty of Information Technology, Beijing University of Technology, Beijing, China

Reprint Author's Address:

Email:

Show more details

Related Keywords:

Related Article:

Source :

Year: 2023

Page: 262-271

Language: English

Cited Count:

WoS CC Cited Count: 0

SCOPUS Cited Count:

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 8

Affiliated Colleges:

Online/Total:508/10835371
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.