Indexed by:
Abstract:
Effectively modeling spatio-temporal information in the videos is the key to improving the performance of action recognition. In this work, we propose 3D residual networks with channel and spatial attention modules for action recognition. The proposed network architecture can directly extract spatiotemporal features. Channel attention module and spatial attention module can effectively assist the network to learn what and where to emphasize or suppress, at virtually negligible increase in computation cost. Specifically, we sequentially add channel attention module and spatial attention module to each slice tensor of the intermediate feature map to form channel and spatial attention maps. Then the attention maps are multiplied to the input feature map to reweight important features. We validate our network through extensive experiments and visualization method on the datasets of HMDB-51 and UCF-101.
Keyword:
Reprint Author's Address:
Source :
2020 CHINESE AUTOMATION CONGRESS (CAC 2020)
ISSN: 2688-092X
Year: 2020
Page: 5171-5174
Language: English
Cited Count:
WoS CC Cited Count: 2
SCOPUS Cited Count: 4
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 16
Affiliated Colleges: