Indexed by:
Abstract:
In recent times, pre-training models of a large scale have achieved notable success in various downstream tasks by relying on contrastive image-text pairs to learn high-quality visual general representations from natural language supervision. However, these models typically disregard sentiment knowledge during the pre-training phase, subsequently hindering their capacity for optimal image sentiment analysis. To address these challenges, we propose a sentiment-enriched continual training framework (SECT), which continually trains CLIP and introduces multi-level sentiment knowledge in the further pre-training process through the use of sentiment-based natural language supervision. Moreover, we construct a large-scale weakly annotated sentiment image-text dataset to ensure that the model is trained robustly. In addition, SECT conducts three training objectives that effectively integrate multi-level sentiment knowledge into the model training process. Our experiments on various datasets, namely EmotionROI, FI, and Twitter I, demonstrate that our SECT method provides a pre-training model that outperforms previous models and CLIP on most of the downstream datasets. Our codes will be publicly available for research purposes. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023.
Keyword:
Reprint Author's Address:
Email:
Source :
ISSN: 0302-9743
Year: 2023
Volume: 14355 LNCS
Page: 93-105
Language: English
Cited Count:
WoS CC Cited Count: 0
SCOPUS Cited Count: 1
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 6
Affiliated Colleges: