• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Liu, Heng (Liu, Heng.) | Wang, Boyue (Wang, Boyue.) | Sun, Yanfeng (Sun, Yanfeng.) | Gao, Junbin (Gao, Junbin.) | Li, Xiaoyan (Li, Xiaoyan.) | Hu, Yongli (Hu, Yongli.) | Yin, Baocai (Yin, Baocai.)

Indexed by:

EI Scopus SCIE

Abstract:

Accurately locating the question-related regions in one given image is crucial for visual question answering (VQA). The current approaches suffer two limitations: (1) Dividing one image into multiple regions may lose parts of semantic information and original relationships between regions; (2) Choosing only one or all image regions to predict the answer may correspondingly result in the insufficiency or redundancy of information. Therefore, how to effectively mine the relationship between image regions and choose the relevant image regions are vital. In this paper, we propose a novel Multi-granularity feature interaction and Multi-region selection-based triplet VQA model (M2TVQA). To tackle the first limitation, we propose the multi-granularity feature interaction strategy that adaptively supplements the global coarse-granularity features with the regional fine-granularity features. To overcome the second limitation, we design the Top-K learning strategy to adaptively select K most relevant image regions to the question, even if the selected regions are far away in space. Such a strategy can select as many relevant image regions as possible and reduce introducing noise. Finally, we construct the multi-modality triplet to predict the answer of VQA. Extended experiments on two public outside knowledge datasets OK-VQA and KRVQA verify the effectiveness of the proposed model.

Keyword:

cross-modality analysis Visual question answering multi-granularity features

Author Community:

  • [ 1 ] [Liu, Heng]Beijing Univ Technol, Beijing Inst Artificial Intelligence, Sch Informat Sci & Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China
  • [ 2 ] [Wang, Boyue]Beijing Univ Technol, Beijing Inst Artificial Intelligence, Sch Informat Sci & Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China
  • [ 3 ] [Sun, Yanfeng]Beijing Univ Technol, Beijing Inst Artificial Intelligence, Sch Informat Sci & Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China
  • [ 4 ] [Li, Xiaoyan]Beijing Univ Technol, Beijing Inst Artificial Intelligence, Sch Informat Sci & Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China
  • [ 5 ] [Hu, Yongli]Beijing Univ Technol, Beijing Inst Artificial Intelligence, Sch Informat Sci & Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China
  • [ 6 ] [Yin, Baocai]Beijing Univ Technol, Beijing Inst Artificial Intelligence, Sch Informat Sci & Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China
  • [ 7 ] [Gao, Junbin]Univ Sydney, Business Sch, Discipline Business Analyt, Camperdown, NSW, Australia

Reprint Author's Address:

  • [Wang, Boyue]Beijing Univ Technol, Beijing Inst Artificial Intelligence, Sch Informat Sci & Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China

Show more details

Related Keywords:

Related Article:

Source :

IEEE TRANSACTIONS ON BIG DATA

ISSN: 2332-7790

Year: 2025

Issue: 3

Volume: 11

Page: 1346-1356

7 . 2 0 0

JCR@2022

Cited Count:

WoS CC Cited Count:

SCOPUS Cited Count:

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 0

Affiliated Colleges:

Online/Total:2462/10971496
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.