Indexed by:
Abstract:
Recent years have witnessed remarkable achievements in video-based action recognition. Apart from traditional frame-based cameras, event cameras are bio-inspired vision sensors that only record pixel-wise brightness changes rather than the brightness value. However, little effort has been made in event-based action recognition, and large-scale public datasets are also nearly unavailable. In this paper,we present an event-based action recognition framework called EV-ACT. The Learnable Multi-Fused Representation (LMFR) is first proposed to integrate multiple event information in a learnable manner. The LMFR with dual temporal granularity is fed into the event-based slow-fast network for the fusion of appearance and motion features. A spatial-temporal attention mechanism is introduced to further enhance the learning capability of action recognition. To prompt research in this direction, we have collected the largest event-based action recognition benchmark named $\mathbf{THU}^{\mathbf{E-ACT}}\mathbf{-50}$ and the accompanying $\mathbf{THU}^{\mathbf{E-ACT}}\mathbf{-50-CHL}$ dataset under challenging environments, including a total of over 12,830 recordings from 50 action categories, which is over 4 times the size of the previous largest dataset. Experimental results show that our proposed framework could achieve improvements of over 14.5%, 7.6%, 11.2%, and 7.4% compared to previous works on four benchmarks. We have also deployed our proposed EV-ACT framework on a mobile platform to validate its practicality and efficiency. IEEE
Keyword:
Reprint Author's Address:
Email:
Source :
IEEE Transactions on Pattern Analysis and Machine Intelligence
ISSN: 0162-8828
Year: 2023
Issue: 12
Volume: 45
Page: 1-17
2 3 . 6 0 0
JCR@2022
ESI Discipline: ENGINEERING;
ESI HC Threshold:19
Cited Count:
WoS CC Cited Count: 0
SCOPUS Cited Count: 16
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 9
Affiliated Colleges: