Action Recognition and Benchmark Using Event Cameras - Details

Author：

Indexed by：

EI Scopus SCIE

Abstract：

Recent　years　have　witnessed　remarkable　achievements　in　video-based　action　recognition.　Apart　from　traditional　frame-based　cameras,　event　cameras　are　bio-inspired　vision　sensors　that　only　record　pixel-wise　brightness　changes　rather　than　the　brightness　value.　However,　little　effort　has　been　made　in　event-based　action　recognition,　and　large-scale　public　datasets　are　also　nearly　unavailable.　In　this　paper,we　present　an　event-based　action　recognition　framework　called　EV-ACT.　The　Learnable　Multi-Fused　Representation　(LMFR)　is　first　proposed　to　integrate　multiple　event　information　in　a　learnable　manner.　The　LMFR　with　dual　temporal　granularity　is　fed　into　the　event-based　slow-fast　network　for　the　fusion　of　appearance　and　motion　features.　A　spatial-temporal　attention　mechanism　is　introduced　to　further　enhance　the　learning　capability　of　action　recognition.　To　prompt　research　in　this　direction,　we　have　collected　the　largest　event-based　action　recognition　benchmark　named　$\mathbf{THU}^{\mathbf{E-ACT}}\mathbf{-50}$　and　the　accompanying　$\mathbf{THU}^{\mathbf{E-ACT}}\mathbf{-50-CHL}$　dataset　under　challenging　environments,　including　a　total　of　over　12,830　recordings　from　50　action　categories,　which　is　over　4　times　the　size　of　the　previous　largest　dataset.　Experimental　results　show　that　our　proposed　framework　could　achieve　improvements　of　over　14.5%,　7.6%,　11.2%,　and　7.4%　compared　to　previous　works　on　four　benchmarks.　We　have　also　deployed　our　proposed　EV-ACT　framework　on　a　mobile　platform　to　validate　its　practicality　and　efficiency.　IEEE

Keyword：

event camera dynamic vision sensor Action recognition Cameras Recording Task analysis event representation Vision sensors Filtering Benchmark testing Visualization

Author Community：

[ 1 ] [Gao Y.]BNRist, THUIBCS, KLISS, BLBCI, School of Software, Tsinghua University, Beijing, China
[ 2 ] [Lu J.]BNRist, THUIBCS, KLISS, BLBCI, School of Software, Tsinghua University, Beijing, China
[ 3 ] [Li S.]BNRist, THUIBCS, KLISS, BLBCI, School of Software, Tsinghua University, Beijing, China
[ 4 ] [Ma N.]Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing, China
[ 5 ] [Du S.]National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, National Engineering Research Center for Visual Information and Applications, and Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, Xi'an, China
[ 6 ] [Li Y.]BNRist, THUIBCS, BLBCI, Department of Automation, Tsinghua University, Beijing, China
[ 7 ] [Dai Q.]BNRist, THUIBCS, BLBCI, Department of Automation, Tsinghua University, Beijing, China

Reprint Author's Address：

Email：

Show more details

Related Keywords：

Event Tubelet Compressor: Generating Compact Representations for Event-Based Action Recognition
2022，2022 7TH INTERNATIONAL CONFERENCE ON CONTROL, ROBOTICS AND CYBERNETICS, CRC
Event Voxel Set Transformer for Spatiotemporal Representation Learning on Event Streams
2024，IEEE Transactions on Circuits and Systems for Video Technology
A Binocular Vision Application in IoT: Realtime Trustworthy Road Condition Detection System in Passable Area
2023，IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS
Meta-Learning Paradigm and CosAttn for Streamer Action Recognition in Live Video
2022，IEEE SIGNAL PROCESSING LETTERS

Source ：

IEEE Transactions on Pattern Analysis and Machine Intelligence

ISSN： 0162-8828

Year： 2023

Issue： 12

Volume： 45

Page： 1-17

2 3 . 6 0 0

JCR@2022

ESI Discipline： ENGINEERING;

ESI HC Threshold：19

Cited Count：

WoS CC Cited Count： 0

SCOPUS Cited Count： 16

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 9

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search SCOPUS

Type
Departments

All Years Choose Year From to