Indexed by:
Abstract:
Live video hosted by streamers is being sought after by an increasing number of Internet users. Some streamers mix pornographic content with live video for profit and popularity, but this greatly harms the network environment. To effectively identify porn streamers, a multilevel fusion method of multimodal deep features for porn streamer recognition in live video is proposed in this paper. (1) Visual and audio features including spatial, audio, motion, and temporal context in live video are extracted by a multimodal deep network. (2) Audio-visual attention features are obtained by fusing visual and audio features at the feature level based on a multimodal attention mechanism. (3) Text features are extracted by using the bullet screen text network based on the BERT (bidirectional encoder representations from transformers) model after collecting text information from the viewers bullet screen comments. (4) The prediction results of the audio-visual deep network and the bullet screen text network are fused at the decision level to improve the porn streamer recognition accuracy. We build a real-world dataset of porn streamers and conduct experiments and demonstrate that our method can improve the porn streamer recognition accuracy. © 2020 Elsevier B.V.
Keyword:
Reprint Author's Address:
Email:
Source :
Pattern Recognition Letters
ISSN: 0167-8655
Year: 2020
Volume: 140
Page: 150-157
5 . 1 0 0
JCR@2022
ESI Discipline: ENGINEERING;
ESI HC Threshold:115
Cited Count:
WoS CC Cited Count: 0
SCOPUS Cited Count: 15
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 2
Affiliated Colleges: