Contrastive Visual-Question-Caption Counterfactuals on Biased Samples for Visual Question Answering - Details

Author：

Ju, X. (Ju, X..) | Wang, B. (Wang, B..) | Li, X. (Li, X..)

Indexed by：

EI Scopus

Abstract：

The　issue　of　language　priors　persists　in　existing　Visual　Question　Answering　(VQA)　models,　hindering　their　ability　to　generalize　across　diverse　QA　distributions.　Traditional　strategies　for　counterfactual　sample　synthesis,　which　aim　to　eliminate　language　bias　by　generating　counterfactuals　for　all　training　samples,　encounter　two　primary　challenges:　(1)　Not　every　sample　contributes　to　language　bias;　thus,　indiscriminate　counterfactual　synthesis　may　introduce　new　biases　and　adversely　affect　the　model　learning　process.　(2)　The　counterfactuals　of　questions　often　lose　significant　information,　failing　to　effectively　heighten　the　model＇s　sensitivity　to　key　terms.　In　this　paper,　we　introduce　the　Contrastive　Visual-Question-Caption　Counterfactuals　model　for　Biased　Samples　in　VQA　tasks.　This　model　integrates　captions　to　augment　visual　information　within　the　textual　domain　and　constructs　counterfactual　samples　exclusively　for　biased　samples,　thereby　mitigating　the　negative　impacts　of　language　bias.　Specifically,　we　employ　a　biased　sample　selection　module　to　identify　samples　with　language　biases　within　the　training　set,　considering　that　unbiased　samples　do　not　exacerbate　the　model＇s　reliance　on　language　patterns.　To　enrich　the　visual　content　in　the　textual　domain,　we　synthesize　caption-based　counterfactual　samples.　To　further　enhance　the　effectiveness　of　counterfactual　samples　in　improving　the　model＇s　sensitivity,　we　develop　a　counterfactual　contrast　learning　module.　This　module　is　designed　to　discern　the　relationship　between　visual　and　textual　components　within　the　same　sample.　Experimental　results　demonstrate　that　our　proposed　model　not　only　is　compatible　with　various　VQA　backbones　but　also　significantly　improves　performance　on　the　out-of-distribution　dataset　VQA　CP　v2.　©　2024　Technical　Committee　on　Control　Theory,　Chinese　Association　of　Automation.

Keyword：

Counterfactual Language bias Visual question answering

Author Community：

[ 1 ] [Ju X.]Beijing University of Technology, Beijing, 100124, China
[ 2 ] [Wang B.]Beijing University of Technology, Beijing, 100124, China
[ 3 ] [Li X.]Beijing University of Technology, Beijing, 100124, China

Reprint Author's Address：

Email：

Show more details

Related Keywords：

Counterfactual controlled quantum dialogue protocol
2025，QUANTUM INFORMATION PROCESSING
Fair Attention Network for Robust Visual Question Answering
2024，IEEE Transactions on Circuits and Systems for Video Technology
VQA-PDF: Purifying Debiased Features for Robust Visual Question Answering Task
2024，
See and Learn More: Dense caption-aware Representation for Visual Question Answering
2023，IEEE Transactions on Circuits and Systems for Video Technology

Source ：

ISSN： 1934-1768

Year： 2024

Page： 7603-7609

Language： English

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count：

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 10

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search SCOPUS

Type
Departments

All Years Choose Year From to