• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Liu, T. (Liu, T..) | Hu, Y. (Hu, Y..) | Gao, J. (Gao, J..) | Sun, Y. (Sun, Y..) | Yin, B. (Yin, B..)

Indexed by:

EI Scopus SCIE

Abstract:

In the context of long document classification (LDC), effectively utilizing multi-modal information encompassing texts and images within these documents has not received adequate attention. This task showcases several notable characteristics. Firstly, the text possesses an implicit or explicit hierarchical structure consisting of sections, sentences, and words. Secondly, the distribution of images is dispersed, encompassing various types such as highly relevant topic images and loosely related reference images. Lastly, intricate and diverse relationships exist between images and text at different levels. To address these challenges, we propose a novel approach called Hierarchical Multi-modal Prompting Transformer (HMPT). Our proposed method constructs the uni-modal and multi-modal transformers at both the section and sentence levels, facilitating effective interaction between features. Notably, we design an adaptive multi-scale multi-modal transformer tailored to capture the multi-granularity correlations between sentences and images. Additionally, we introduce three different types of shared prompts, i.e., shared section, sentence, and image prompts, as bridges connecting the isolated transformers, enabling seamless information interaction across different levels and modalities. To validate the model performance, we conducted experiments on two newly created and two publicly available multi-modal long document datasets. The obtained results show that our method outperforms state-of-the-art single-modality and multi-modality classification methods. IEEE

Keyword:

Circuits and systems Adaptation models Multi-modal long document classification prompt learning Transformers Computational modeling Visualization adaptive multi-scale multi-modal transformer Feature extraction multi-modal transformer Task analysis

Author Community:

  • [ 1 ] [Liu T.]Beijing Institute of Artificial Intelligence, Faculty of Information Technology, Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Beijing University of Technology, Beijing, China
  • [ 2 ] [Hu Y.]Beijing Institute of Artificial Intelligence, Faculty of Information Technology, Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Beijing University of Technology, Beijing, China
  • [ 3 ] [Gao J.]Discipline of Business Analytics, The University of Sydney Business School, The University of Sydney, Camperdown, NSW, Australia
  • [ 4 ] [Sun Y.]Beijing Institute of Artificial Intelligence, Faculty of Information Technology, Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Beijing University of Technology, Beijing, China
  • [ 5 ] [Yin B.]Beijing Institute of Artificial Intelligence, Faculty of Information Technology, Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Beijing University of Technology, Beijing, China

Reprint Author's Address:

Email:

Show more details

Related Keywords:

Related Article:

Source :

IEEE Transactions on Circuits and Systems for Video Technology

ISSN: 1051-8215

Year: 2024

Issue: 7

Volume: 34

Page: 1-1

8 . 4 0 0

JCR@2022

Cited Count:

WoS CC Cited Count: 0

SCOPUS Cited Count: 6

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 15

Affiliated Colleges:

Online/Total:501/10586084
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.