• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Ju, X. (Ju, X..) | Wang, B. (Wang, B..) | Li, X. (Li, X..)

Indexed by:

EI Scopus

Abstract:

The issue of language priors persists in existing Visual Question Answering (VQA) models, hindering their ability to generalize across diverse QA distributions. Traditional strategies for counterfactual sample synthesis, which aim to eliminate language bias by generating counterfactuals for all training samples, encounter two primary challenges: (1) Not every sample contributes to language bias; thus, indiscriminate counterfactual synthesis may introduce new biases and adversely affect the model learning process. (2) The counterfactuals of questions often lose significant information, failing to effectively heighten the model's sensitivity to key terms. In this paper, we introduce the Contrastive Visual-Question-Caption Counterfactuals model for Biased Samples in VQA tasks. This model integrates captions to augment visual information within the textual domain and constructs counterfactual samples exclusively for biased samples, thereby mitigating the negative impacts of language bias. Specifically, we employ a biased sample selection module to identify samples with language biases within the training set, considering that unbiased samples do not exacerbate the model's reliance on language patterns. To enrich the visual content in the textual domain, we synthesize caption-based counterfactual samples. To further enhance the effectiveness of counterfactual samples in improving the model's sensitivity, we develop a counterfactual contrast learning module. This module is designed to discern the relationship between visual and textual components within the same sample. Experimental results demonstrate that our proposed model not only is compatible with various VQA backbones but also significantly improves performance on the out-of-distribution dataset VQA CP v2. © 2024 Technical Committee on Control Theory, Chinese Association of Automation.

Keyword:

Counterfactual Language bias Visual question answering

Author Community:

  • [ 1 ] [Ju X.]Beijing University of Technology, Beijing, 100124, China
  • [ 2 ] [Wang B.]Beijing University of Technology, Beijing, 100124, China
  • [ 3 ] [Li X.]Beijing University of Technology, Beijing, 100124, China

Reprint Author's Address:

Email:

Show more details

Related Keywords:

Related Article:

Source :

ISSN: 1934-1768

Year: 2024

Page: 7603-7609

Language: English

Cited Count:

WoS CC Cited Count:

SCOPUS Cited Count:

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 10

Affiliated Colleges:

Online/Total:383/10592834
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.