• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Sun, Z. (Sun, Z..) | Hu, Y. (Hu, Y..) | Gao, Q. (Gao, Q..) | Jiang, H. (Jiang, H..) | Gao, J. (Gao, J..) | Sun, Y. (Sun, Y..) | Yin, B. (Yin, B..)

Indexed by:

CPCI-S EI Scopus

Abstract:

Considerable performance gains have been achieved for knowledge-based visual question answering due to the visual-language pre-training models with pre-training-then-fine-tuning paradigm. However, because the targets of the pre-training and fine-tuning stages are different, there is an evident barrier that prevents the cross-modal comprehension ability developed in the pre-training stage from fully endowing the fine-tuning task. To break this barrier, in this paper, we propose a novel hybrid prompting model for knowledge-based VQA, which inherits and incorporates the pre-training and fine-tuning tasks with a shared objective. Specifically, based on static declaration prompt, we construct a consistent goal with the fine-tuning via masked language modeling to inherit capabilities of pre-training task, while selecting the top-t relevant knowledge in a dense retrieval manner. Additionally, a dynamic knowledge prompt is learned from retrieved knowledge, which not only alleviates the length constraint on inputs for visual-language pre-trained models but also assists in providing answer features via fine-tuning. Combining and unifying the aims of the two stages could fully exploit the abilities of pre-training and fine-tuning to predict answer. We evaluate the proposed model on the OKVQA dataset, and the result shows that our model outperforms the state-of-the-art methods based on visual-language pre-training models with a noticeable performance gap and even exceeds the large-scale language model of GPT-3, which proves the benefits of the hybrid prompts and the advantages of unifying pre-training to fine-tuning. © 2023 ACM.

Keyword:

multi-modal fusion knowledge integration visual question answering

Author Community:

  • [ 1 ] [Sun Z.]Beijing University of Technology, Beijing, China
  • [ 2 ] [Hu Y.]Beijing University of Technology, Beijing, China
  • [ 3 ] [Gao Q.]Beijing University of Technology, Beijing, China
  • [ 4 ] [Jiang H.]Beijing University of Technology, Beijing, China
  • [ 5 ] [Gao J.]The University of Sydney, Sydney, Australia
  • [ 6 ] [Sun Y.]Beijing University of Technology, Beijing, China
  • [ 7 ] [Yin B.]Beijing University of Technology, Beijing, China

Reprint Author's Address:

Email:

Show more details

Related Keywords:

Source :

Year: 2023

Page: 4065-4073

Language: English

Cited Count:

WoS CC Cited Count:

SCOPUS Cited Count: 5

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 4

Affiliated Colleges:

Online/Total:505/10554594
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.