Learning to compose diversified prompts for image emotion classification - Details

Author：

Indexed by：

EI Scopus SCIE

Abstract：

Image　emotion　classification　(IEC)　aims　to　extract　the　abstract　emotions　evoked　in　images.　Recently,　language-supervised　methods　such　as　contrastive　language-image　pretraining　(CLIP)　have　demonstrated　superior　performance　in　image　understanding.　However,　the　underexplored　task　of　IEC　presents　three　major　challenges:　a　tremendous　training　objective　gap　between　pretraining　and　IEC,　shared　suboptimal　prompts,　and　invariant　prompts　for　all　instances.　In　this　study,　we　propose　a　general　framework　that　effectively　exploits　the　language-supervised　CLIP　method　for　the　IEC　task.　First,　a　prompt-tuning　method　that　mimics　the　pretraining　objective　of　CLIP　is　introduced,　to　exploit　the　rich　image　and　text　semantics　associated　with　CLIP.　Subsequently,　instance-specific　prompts　are　automatically　composed,　conditioning　them　on　the　categories　and　image　content　of　instances,　diversifying　the　prompts,　and　thus　avoiding　suboptimal　problems.　Evaluations　on　six　widely　used　affective　datasets　show　that　the　proposed　method　significantly　outperforms　state-of-the-art　methods　(up　to　9.29%　accuracy　gain　on　the　EmotionROI　dataset)　on　IEC　tasks　with　only　a　few　trained　parameters.　The　code　is　publicly　available　at　https://github.com/dsn0w/PT-DPC/for　research　purposes.　(Figure　presented.)　©　The　Author(s)　2024.

Keyword：

multimodal learning image emotion analysis prompt tuning pretraining model

Author Community：

[ 1 ] [Deng S.]Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
[ 2 ] [Wu L.]Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
[ 3 ] [Shi G.]Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
[ 4 ] [Xing L.]Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
[ 5 ] [Jian M.]Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
[ 6 ] [Xiang Y.]Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
[ 7 ] [Dong R.]Insight Centre for Data Analytics, University College Dublin, Belfield, Dublin, D04 V1W8, Ireland

Reprint Author's Address：

Email：

Show more details

Related Keywords：

Enhancing Multimodal Meteorological Data Resolution via Diffusion Model for Accurate PV Potential Estimation
2024，
SusRec: An Approach to Sustainable Developer Recommendation for Bug Resolution Using Multimodal Ensemble Learning
2022，IEEE TRANSACTIONS ON RELIABILITY
Multi-feature fusion malware detection method based on attention and gating mechanisms; [基于注意力与门控机制的多特征融合恶意软件检测方法]
2024，Chinese Journal of Network and Information Security
Expand Prompt Verbalizer by Extracting Knowledge for Chinese Text Classification
2023，

Source ：

Computational Visual Media

ISSN： 2096-0433

Year： 2024

Issue： 6

Volume： 10

Page： 1169-1183

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count： 6

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 5

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search SCOPUS

Type
Departments

All Years Choose Year From to