• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Wang, Boyue (Wang, Boyue.) | Ma, Yujian (Ma, Yujian.) | Li, Xiaoyan (Li, Xiaoyan.) | Liu, Heng (Liu, Heng.) | Hu, Yongli (Hu, Yongli.) | Yin, Baocai (Yin, Baocai.)

Indexed by:

EI Scopus SCIE

Abstract:

Visual Question Answering (VQA) aims to appropriately answer a text question by understanding the image content. Attention-based VQA models mine the implicit relationships between objects according to the feature similarity, which neglects the explicit relationships between objects, for example, the relative position. Most Visual Scene Graph-based VQA models exploit the relative positions or visual relationships between objects to construct the visual scene graph, while they suffer from the semantic insufficiency of visual edge relations. Besides, the scene graph of text modality is often ignored in these works. In this article, a novel Dual Scene Graph Enhancement Module (DSGEM) is proposed that exploits the relevant external knowledge to simultaneously construct two interpretable scene graph structures of image and text modalities, which makes the reasoning process more logical and precise. Specifically, the authors respectively build the visual and textual scene graphs with the help of commonsense knowledge and syntactic structure, which explicitly endows the specific semantics to each edge relation. Then, two scene graph enhancement modules are proposed to propagate the involved external and structural knowledge to explicitly guide the feature interaction between objects (nodes). Finally, the authors embed such two scene graph enhancement modules to existing VQA models to introduce the explicit relation reasoning ability. Experimental results on both VQA V2 and OK-VQA datasets show that the proposed DSGEM is effective and compatible to various VQA architectures.

Keyword:

question answering (information retrieval) image representation

Author Community:

  • [ 1 ] [Wang, Boyue]Beijing Univ Technol, 100 Pingleyuan, Beijing 100124, Peoples R China
  • [ 2 ] [Ma, Yujian]Beijing Univ Technol, 100 Pingleyuan, Beijing 100124, Peoples R China
  • [ 3 ] [Li, Xiaoyan]Beijing Univ Technol, 100 Pingleyuan, Beijing 100124, Peoples R China
  • [ 4 ] [Liu, Heng]Beijing Univ Technol, 100 Pingleyuan, Beijing 100124, Peoples R China
  • [ 5 ] [Hu, Yongli]Beijing Univ Technol, 100 Pingleyuan, Beijing 100124, Peoples R China
  • [ 6 ] [Yin, Baocai]Beijing Univ Technol, 100 Pingleyuan, Beijing 100124, Peoples R China

Reprint Author's Address:

  • [Li, Xiaoyan]Beijing Univ Technol, 100 Pingleyuan, Beijing 100124, Peoples R China;;

Show more details

Related Keywords:

Related Article:

Source :

IET COMPUTER VISION

ISSN: 1751-9632

Year: 2023

Issue: 6

Volume: 17

Page: 638-651

1 . 7 0 0

JCR@2022

ESI Discipline: COMPUTER SCIENCE;

ESI HC Threshold:19

Cited Count:

WoS CC Cited Count: 1

SCOPUS Cited Count: 1

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 9

Affiliated Colleges:

Online/Total:1743/10906343
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.