Indexed by:
Abstract:
The growth of social media in recent years has contributed to the spread of fake news on the Internet. Since multimodal contents such as pictures have a huge impact on the spread of news in social media, researchers are increasingly focusing on the automatic detection of multimodal false news. Existing multimodal approaches disregard the recent development results of vision pretraining, consequently, their visual feature extractor primarily uses early proposed convolutional neural networks, such as VGG and ResNet. Compared with vision Transformers, traditional convolution neural networks have insufficient ability to extract local semantics from images as well as capture the relationships between different regions in images, which may reduce the performance of multimodal methods. To address this issue, we propose a multimodal fake news detection framework based on vision Transformer(MDVT). By experimenting with different multi-modal combination methods and feature extractors on this framework, we get a top-performing which combined MacBERT text features and Swin Transformer image features through the maximum method, named MDVT-MS. Experimental studies on a widely used social media dataset Weibo demonstrate that MDVT-MS outperforms current state-of-The-Art multimodal detection methods by 4.4% in accuracy. This proves the effectiveness of the proposed method and shows that vision Transformer can be used as a new vision backbone for multimodal fake news detection. © 2023 ACM.
Keyword:
Reprint Author's Address:
Email:
Source :
Year: 2023
Page: 121-125
Language: English
Cited Count:
WoS CC Cited Count: 0
SCOPUS Cited Count: 1
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 12
Affiliated Colleges: