Learning Label Semantics for Weakly Supervised Group Activity Recognition - Details

Author：

Wu, L. (Wu, L..) | Tian, M. (Tian, M..) | Xiang, Y. (Xiang, Y..) | Gu, K. (Gu, K..) | Shi, G. (Shi, G..)

Indexed by：

EI Scopus SCIE

Abstract：

Weakly　supervised　group　activity　recognition　deals　with　the　dependence　on　individual-level　annotations　during　understanding　scenes　involving　multiple　individuals,　which　is　a　challenging　task.　Existing　methods　either　take　the　trained　detectors　to　extract　individual　features　or　utilize　the　attention　mechanisms　for　partial　context　encoding,　followed　by　integration　to　form　the　final　group-level　representations.　However,　the　detectors　require　individual-level　annotations　during　the　training　phase　and　have　a　mis-detection　issue,　and　the　partial　contexts　extracted　immediately　from　the　whole　complex　scene　are　too　ambiguous　without　the　guidance　of　concrete　semantics.　In　this　paper,　we　investigate　the　hierarchical　structure　inherent　in　group-level　labels　to　extract　the　fine-grained　semantics　without　using　detectors　for　weakly　supervised　group　activity　recognition.　A　multi-hot　encoding　strategy　combined　with　a　semantic　encoder　is　first　adopted　to　get　the　label　semantics　embeddings.　The　semantic　and　visual　scene　information　are　then　fused　through　a　semantic　decoder　to　obtain　activity-specific　features.　Lastly,　we　employ　the　multi-label　classification　and　integrate　the　scores　of　hierarchical　activity　labels.　Experimental　results　show　that　our　proposed　method　achieves　the　state-of-the-art　performance　on　three　benchmarks,　and　the　accuracy　on　the　Volleyball　dataset　exceeds　the　second-best　method　by　2%.　IEEE

Keyword：

Feature extraction Label Semantics Weakly Supervised Group Activity Recognition Visualization Semantics Activity recognition Encoding Multi-Label Classification Transformers Annotations

Author Community：

[ 1 ] [Wu L.]Faculty of Information Technology, Beijing University of Technology, Beijing, China
[ 2 ] [Tian M.]Faculty of Information Technology, Beijing University of Technology, Beijing, China
[ 3 ] [Xiang Y.]Faculty of Information Technology, Beijing University of Technology, Beijing, China
[ 4 ] [Gu K.]Faculty of Information Technology, Beijing University of Technology, Beijing, China
[ 5 ] [Shi G.]Faculty of Information Technology, Beijing University of Technology, Beijing, China

Reprint Author's Address：

Email：

Show more details

Related Keywords：

Cellular spatial-semantic embedding for multi-label classification of cell clusters in thyroid fine needle aspiration biopsy whole slide images
2025，Pattern Recognition Letters
Domain-aware Prototype Network for Generalized Zero-Shot Learning
2023，IEEE Transactions on Circuits and Systems for Video Technology
GCFormer: Global Context-Aware Transformer for Remote Sensing Image Change Detection
2024，IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING
Modality Perception Learning-Based Determinative Factor Discovery for Multimodal Fake News Detection
2024，IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

Source ：

IEEE Transactions on Multimedia

ISSN： 1520-9210

Year： 2024

Volume： 26

Page： 1-12

7 . 3 0 0

JCR@2022

Cited Count：

WoS CC Cited Count： 0

SCOPUS Cited Count： 5

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 12

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search SCOPUS

Type
Departments

All Years Choose Year From to