Cross鄄Modal Multi鄄level Fusion Sentiment Analysis Method Based on Visual Language Model; [基于视觉语言模型的跨模态多级融合情感分析方法] - Details

Author：

Xie, R. (Xie, R..) | Zhang, B. (Zhang, B..) | Du, Y. (Du, Y..)

Indexed by：

EI Scopus

Abstract：

Image-text　multimodal　sentiment　analysis　aims　to　predict　sentimental　polarity　by　integrating　visual　modalities　and　text　modalities.　The　key　to　solving　the　multimodal　sentiment　analysis　task　is　obtaining　high-quality　multimodal　representations　of　both　visual　and　textual　modalities　and　achieving　efficient　fusion　of　these　representations.　Therefore,　a　cross-modal　multi-level　fusion　sentiment　analysis　method　based　on　visual　language　model(MFVL)　is　proposed.　Firstly,　based　on　the　pre-trained　visual　language　model,　high-quality　multimodal　representations　and　modality　bridge　representations　are　generated　by　freezing　the　parameters　and　a　low-rank　adaptation　method　being　adopted　for　fine-tuning　the　large　language　model.　Secondly,　a　cross-modal　multi-head　co-attention　fusion　module　is　designed　to　perform　interactive　weighted　fusion　of　the　visual　and　textual　modality　representations　respectively.　Finally,　a　mixture　of　experts　module　is　designed　to　deeply　fuse　the　visual,　textual　and　modality　bridging　representations　to　achieve　multimodal　sentiment　analysis.　Experimental　results　indicate　that　MFVL　achieves　state-of-the-art　performance　on　the　public　evaluation　datasets　MVSA-Single　and　HFM.　©　2024　Science　Press.　All　rights　reserved.

Keyword：

Sentiment Analysis Multimodal Fusion Visual Language Model Multi-head Attention Mixture of Experts Network

Author Community：

[ 1 ] [Xie R.]Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
[ 2 ] [Zhang B.]Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
[ 3 ] [Du Y.]Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China

Reprint Author's Address：

Email：

Show more details

Related Keywords：

A speech understanding-based method for recognizing psychological medical speech feelings
2022，2022 IEEE 46TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE (COMPSAC 2022)
Leveraging Large Language Model ChatGPT for enhanced understanding of end-user emotions in social media feedbacks
2024，EXPERT SYSTEMS WITH APPLICATIONS
Scene Graph Generation Method Based on Dual-stream Multi-head Attention; [基于双分支多头注意力的场景图生成方法]
2024，Journal of Beijing University of Technology
Intra- and Inter-Head Orthogonal Attention for Image Captioning
2025，IEEE TRANSACTIONS ON IMAGE PROCESSING

Source ：

Pattern Recognition and Artificial Intelligence

ISSN： 1003-6059

Year： 2024

Issue： 5

Volume： 37

Page： 459-468

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count： 2

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 6

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search SCOPUS

Type
Departments

All Years Choose Year From to