Indexed by:
Abstract:
Weakly supervised group activity recognition deals with the dependence on individual-level annotations during understanding scenes involving multiple individuals, which is a challenging task. Existing methods either take the trained detectors to extract individual features or utilize the attention mechanisms for partial context encoding, followed by integration to form the final group-level representations. However, the detectors require individual-level annotations during the training phase and have a mis-detection issue, and the partial contexts extracted immediately from the whole complex scene are too ambiguous without the guidance of concrete semantics. In this paper, we investigate the hierarchical structure inherent in group-level labels to extract the fine-grained semantics without using detectors for weakly supervised group activity recognition. A multi-hot encoding strategy combined with a semantic encoder is first adopted to get the label semantics embeddings. The semantic and visual scene information are then fused through a semantic decoder to obtain activity-specific features. Lastly, we employ the multi-label classification and integrate the scores of hierarchical activity labels. Experimental results show that our proposed method achieves the state-of-the-art performance on three benchmarks, and the accuracy on the Volleyball dataset exceeds the second-best method by 2%. IEEE
Keyword:
Reprint Author's Address:
Email:
Source :
IEEE Transactions on Multimedia
ISSN: 1520-9210
Year: 2024
Volume: 26
Page: 1-12
7 . 3 0 0
JCR@2022
Cited Count:
WoS CC Cited Count: 0
SCOPUS Cited Count: 5
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 12
Affiliated Colleges: