Indexed by:
Abstract:
Image sentiment analysis is a domain fraught with the dual challenges of interpreting complex visual content and discerning the subtle emotional undertones it may convey. Despite the notable successes of existing visual language pretraining (VLP) models in a variety of visual tasks, they fall short in the nuanced realm of sentiment analysis. This shortfall is primarily due to their inadequate processing of sentiment-specific cues - most notably, the oversight of localized sentioment cues within images and the intricate interplay of these signals. Furthermore, these models inadequately harness the rich sentiment cues often embedded in accompanying text. In response to these shortcomings, we introduce the Specific Sentiment Mask Auto-encoder (S2MA) model, which is expressly designed to integrate sentiment information during the pretraining process. S2MA is meticulously engineered to focuse on both intermodal and intramodal sentiment cue, thereby augmenting the model's proficiency in anlysising the sentiment knowledge within visual content. Rigorous comparative evaluations of S2MA against the CLIP model, across a spectrum of downstream datasets in zero-shot and supervised learning scenarios, have validated the superiority of our approach. The empirical outcomes affirm S2MA's capacity to significantly enhance the analytical landscape of image sentiment analysis. © 2024 IEEE.
Keyword:
Reprint Author's Address:
Email:
Source :
Year: 2024
Language: English
Cited Count:
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 11
Affiliated Colleges: