• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Wang, Boyue (Wang, Boyue.) | Ju, Xiaoqian (Ju, Xiaoqian.) | Gao, Junbin (Gao, Junbin.) | Li, Xiaoyan (Li, Xiaoyan.) | Hu, Yongli (Hu, Yongli.) | Yin, Baocai (Yin, Baocai.)

Indexed by:

Scopus SCIE

Abstract:

Visual question answering (VQA) models often face two language bias challenges. First, they tend to rely solely on the question to predict the answer, often overlooking relevant information in the accompanying images. Second, even when considering the question, they may focus only on the wh-words, neglecting other crucial keywords that could enhance interpretability and the question sensitivity. Existing debiasing methods attempt to address this by training a bias model using question-only inputs to enhance the robustness of the target VQA model. However, this approach may not fully capture the language bias present. In this article, we propose a multimodality counterfactual dual-bias model to mitigate the linguistic bias issue in target VQA models. Our approach involves designing a shared-parameterized dual-bias model that incorporates both visual and question counterfactual samples as inputs. By doing so, we aim to fully model language biases, with visual and question counterfactual samples, respectively, emphasizing important objects and keywords to relevant the answers. To ensure that our dual-bias model behaves similarly to an ordinary model, we freeze the parameters of the target VQA model, meanwhile using the cross-entropy and Kullback-Leibler (KL) divergence as the loss function to train the dual-bias model. Subsequently, to mitigate language bias in the target VQA model, we freeze the parameters of the dual-bias model to generate pseudo-labels and then incorporate a margin loss to re-train the target VQA model. Experimental results on the VQA-CP datasets demonstrate the superior effectiveness of our proposed counterfactual dual-bias model. Additionally, we conduct an analysis of the unsatisfactory performance on the VQA v2 dataset. The origin code of the proposed model is available at https://github.com/Arrow2022jv/MCD

Keyword:

Visualization Analytical models multimodality analysis Predictive models Training Counterfactual samples Image color analysis Linguistics Sensitivity Reviews debiasing model visual question answering (VQA) Question answering (information retrieval) Cognition

Author Community:

  • [ 1 ] [Wang, Boyue]Beijing Univ Technol, Beijing Artificial Intelligence Inst, Sch Informat Sci & Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China
  • [ 2 ] [Ju, Xiaoqian]Beijing Univ Technol, Beijing Artificial Intelligence Inst, Sch Informat Sci & Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China
  • [ 3 ] [Li, Xiaoyan]Beijing Univ Technol, Beijing Artificial Intelligence Inst, Sch Informat Sci & Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China
  • [ 4 ] [Hu, Yongli]Beijing Univ Technol, Beijing Artificial Intelligence Inst, Sch Informat Sci & Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China
  • [ 5 ] [Yin, Baocai]Beijing Univ Technol, Beijing Artificial Intelligence Inst, Sch Informat Sci & Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China
  • [ 6 ] [Gao, Junbin]Univ Sydney, Business Sch, Discipline Business Analyt, Sydney, NSW 2006, Australia

Reprint Author's Address:

  • [Li, Xiaoyan]Beijing Univ Technol, Beijing Artificial Intelligence Inst, Sch Informat Sci & Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China

Show more details

Related Keywords:

Source :

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

ISSN: 2162-237X

Year: 2025

1 0 . 4 0 0

JCR@2022

Cited Count:

WoS CC Cited Count:

SCOPUS Cited Count:

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 2

Affiliated Colleges:

Online/Total:1294/10904676
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.