MDVT: A Multi-modal Fake News Detection Framework based on Vision Transformer - Details

Author：

Wu, Yuexin (Wu, Yuexin.) | Tang, Yi (Tang, Yi.) | Fan, Chunxiao (Fan, Chunxiao.) | Liu, Yu (Liu, Yu.)

Indexed by：

Abstract：

The　growth　of　social　media　in　recent　years　has　contributed　to　the　spread　of　fake　news　on　the　Internet.　Since　multimodal　contents　such　as　pictures　have　a　huge　impact　on　the　spread　of　news　in　social　media,　researchers　are　increasingly　focusing　on　the　automatic　detection　of　multimodal　false　news.　Existing　multimodal　approaches　disregard　the　recent　development　results　of　vision　pretraining,　consequently,　their　visual　feature　extractor　primarily　uses　early　proposed　convolutional　neural　networks,　such　as　VGG　and　ResNet.　Compared　with　vision　Transformers,　traditional　convolution　neural　networks　have　insufficient　ability　to　extract　local　semantics　from　images　as　well　as　capture　the　relationships　between　different　regions　in　images,　which　may　reduce　the　performance　of　multimodal　methods.　To　address　this　issue,　we　propose　a　multimodal　fake　news　detection　framework　based　on　vision　Transformer(MDVT).　By　experimenting　with　different　multi-modal　combination　methods　and　feature　extractors　on　this　framework,　we　get　a　top-performing　which　combined　MacBERT　text　features　and　Swin　Transformer　image　features　through　the　maximum　method,　named　MDVT-MS.　Experimental　studies　on　a　widely　used　social　media　dataset　Weibo　demonstrate　that　MDVT-MS　outperforms　current　state-of-The-Art　multimodal　detection　methods　by　4.4%　in　accuracy.　This　proves　the　effectiveness　of　the　proposed　method　and　shows　that　vision　Transformer　can　be　used　as　a　new　vision　backbone　for　multimodal　fake　news　detection.　©　2023　ACM.

Keyword：

Social networking (online) Convolution Image processing Modal analysis Semantics Convolutional neural networks Fake detection Deep learning

Author Community：

[ 1 ] [Wu, Yuexin]School of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing, China
[ 2 ] [Tang, Yi]School of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing, China
[ 3 ] [Fan, Chunxiao]School of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing, China
[ 4 ] [Liu, Yu]Beijing-Dublin International College, Beijing University of Technology, Beijing, China

Reprint Author's Address：

Email：

Show more details

Related Keywords：

Learning effective representations from sparse mutlimodal data on content curation social networks
2019，19th IEEE International Conference on Data Mining Workshops, ICDMW 2019
Deep learning based attack on social authentication system
2019，3rd IEEE Information Technology, Networking, Electronic and Automation Control Conference, ITNEC 2019
Multimodal joint representation for user interest analysis on content curation social networks
2018，1st Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2018
SimEmotion: A Simple Knowledgeable Prompt Tuning Method for Image Emotion Classification
2022，27th International Conference on Database Systems for Advanced Applications, DASFAA 2022

Source ：

Year： 2023

Page： 121-125

Language： English

Cited Count：

WoS CC Cited Count： 0

SCOPUS Cited Count： 1

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 12

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search Engineering Village

Type
Departments

All Years Choose Year From to