Indexed by:
Abstract:
With the rapid expansion of the we-media industry, streamers have increasingly incorporated inappropriate content into live videos to attract traffic and pursue interests. Blacklisted streamers often forge their identities or switch platforms to continue streaming, causing significant harm to the online environment. Consequently, streamer re-identification (re-ID) has become of paramount importance. Streamer biometrics in live videos exhibit multimodal characteristics, including voiceprints, faces, and spatiotemporal information, which complement each other. Therefore, we propose a light cross-modal attention network (LCMA-Net) for streamer re-ID in live videos. First, the voiceprint, face, and spatiotemporal features of the streamer are extracted by RawNet-SA, Π-Net, and STDA-ResNeXt3D, respectively. We then design a light cross-modal pooling attention (LCMPA) module, which, combined with a multilayer perceptron (MLP), aligns and concatenates different modality features into multimodal features within the LCMA-Net. Finally, the streamer is re-identified by measuring the similarity between these multimodal features. Five experiments were conducted on the StreamerReID dataset, and the results demonstrated that the proposed method achieved competitive performance. The dataset and code are available at https://github.com/BJUT-AIVBD/LCMA-Net. © 2024 Elsevier Inc.
Keyword:
Reprint Author's Address:
Email:
Source :
Computer Vision and Image Understanding
ISSN: 1077-3142
Year: 2024
Volume: 249
4 . 5 0 0
JCR@2022
Cited Count:
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 10
Affiliated Colleges: