Indexed by:
Abstract:
Group activity recognition aims to recognize behaviors characterized by multiple individuals within a scene. Existing schemes rely on individual relation inference and usually take the individuals as tokens. Essentially they select the most relevant region of the group activity from the entire image while filtering out irrelevant background noises. However, these schemes require individual bounding box labeling in both training and testing stages. Since individuals have usually been presented at one scale, multi-scale individuals cannot be combined in an effective way. In this paper, we present a novel end-to-end hierarchical relation inference framework based on active spatial positions for group activity recognition. This framework is designed to locate active spatial positions and use them as visual tokens to infer the relations for token embeddings. It requires individual bounding box labeling only in the training stage while automatically eliminating the background after locating active spatial positions from the entire scene. The hierarchical relations can be naturally inferred based on the visual tokens at different scales, contributing to further performance improvement. Experimental results demonstrate that the proposed framework is competitive against existing schemes that require more laboring and computation to generate labels in both the training and testing stage. IEEE
Keyword:
Reprint Author's Address:
Email:
Source :
IEEE Transactions on Circuits and Systems for Video Technology
ISSN: 1051-8215
Year: 2022
Issue: 6
Volume: 33
Page: 1-1
8 . 4
JCR@2022
8 . 4 0 0
JCR@2022
ESI Discipline: ENGINEERING;
ESI HC Threshold:49
JCR Journal Grade:1
CAS Journal Grade:2
Cited Count:
SCOPUS Cited Count: 10
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 7
Affiliated Colleges: