• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Deng, S. (Deng, S..) | Wu, L. (Wu, L..) | Shi, G. (Shi, G..) | Xing, L. (Xing, L..) | Jian, M. (Jian, M..) | Xiang, Y. (Xiang, Y..) | Dong, R. (Dong, R..)

Indexed by:

EI Scopus SCIE

Abstract:

Image emotion classification (IEC) aims to extract the abstract emotions evoked in images. Recently, language-supervised methods such as contrastive language-image pretraining (CLIP) have demonstrated superior performance in image understanding. However, the underexplored task of IEC presents three major challenges: a tremendous training objective gap between pretraining and IEC, shared suboptimal prompts, and invariant prompts for all instances. In this study, we propose a general framework that effectively exploits the language-supervised CLIP method for the IEC task. First, a prompt-tuning method that mimics the pretraining objective of CLIP is introduced, to exploit the rich image and text semantics associated with CLIP. Subsequently, instance-specific prompts are automatically composed, conditioning them on the categories and image content of instances, diversifying the prompts, and thus avoiding suboptimal problems. Evaluations on six widely used affective datasets show that the proposed method significantly outperforms state-of-the-art methods (up to 9.29% accuracy gain on the EmotionROI dataset) on IEC tasks with only a few trained parameters. The code is publicly available at https://github.com/dsn0w/PT-DPC/for research purposes. (Figure presented.) © The Author(s) 2024.

Keyword:

multimodal learning image emotion analysis prompt tuning pretraining model

Author Community:

  • [ 1 ] [Deng S.]Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
  • [ 2 ] [Wu L.]Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
  • [ 3 ] [Shi G.]Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
  • [ 4 ] [Xing L.]Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
  • [ 5 ] [Jian M.]Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
  • [ 6 ] [Xiang Y.]Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
  • [ 7 ] [Dong R.]Insight Centre for Data Analytics, University College Dublin, Belfield, Dublin, D04 V1W8, Ireland

Reprint Author's Address:

Email:

Show more details

Related Keywords:

Source :

Computational Visual Media

ISSN: 2096-0433

Year: 2024

Issue: 6

Volume: 10

Page: 1169-1183

Cited Count:

WoS CC Cited Count:

SCOPUS Cited Count: 6

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 5

Affiliated Colleges:

Online/Total:1269/10691274
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.