Self-supervised knowledge distillation in counterfactual learning for VQA - Details

Author：

Bi, Y. (Bi, Y..) | Jiang, H. (Jiang, H..) | Zhang, H. (Zhang, H..) | Hu, Y. (Hu, Y..) | Yin, B. (Yin, B..)

Indexed by：

EI Scopus SCIE

Abstract：

As　a　popular　cross-modal　reasoning　task,　Visual　Question　Answering　(VQA)　has　achieved　great　progress　in　recent　years.　However,　the　issue　of　language　bias　has　always　affected　the　reliability　of　VQA　models.　To　address　this　problem,　counterfactual　learning　methods　are　proposed　to　learn　more　robust　features　to　mitigate　the　bias　problem.　However,　current　counterfactual　learning　approaches　mainly　focus　on　generating　synthesized　samples　and　assigning　answers　to　them,　neglecting　the　relationship　between　factual　and　original　data,　which　hinders　robust　feature　learning　for　effective　reasoning.　To　overcome　this　limitation,　we　propose　a　Self-supervised　Knowledge　Distillation　approach　in　Counterfactual　Learning　for　VQA,　dubbed　as　VQA-SkdCL,　which　utilizes　a　self-supervised　constraint　to　make　good　use　of　the　hidden　knowledge　in　the　factual　samples,　enhancing　the　robustness　of　VQA　models.　We　demonstrate　the　effectiveness　of　the　proposed　approach　on　VQA　v2,　VQA-CP　v1,　and　VQA-CP　v2　datasets　and　our　approach　achieves　excellent　performance.　©　2023　Elsevier　B.V.

Keyword：

Self-supervised learning Counterfactual learning Visual question answering Language bias

Author Community：

[ 1 ] [Bi Y.]Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Beijing Institute of Artificial Intelligence, Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
[ 2 ] [Jiang H.]Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Beijing Institute of Artificial Intelligence, Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
[ 3 ] [Zhang H.]Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Beijing Institute of Artificial Intelligence, Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
[ 4 ] [Hu Y.]Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Beijing Institute of Artificial Intelligence, Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
[ 5 ] [Yin B.]Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Beijing Institute of Artificial Intelligence, Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China

Reprint Author's Address：

Email：

Show more details

Related Keywords：

Fair Attention Network for Robust Visual Question Answering
2024，IEEE Transactions on Circuits and Systems for Video Technology
Contrastive Visual-Question-Caption Counterfactuals on Biased Samples for Visual Question Answering
2024，
VQA-PDF: Purifying Debiased Features for Robust Visual Question Answering Task
2024，
See and Learn More: Dense caption-aware Representation for Visual Question Answering
2023，IEEE Transactions on Circuits and Systems for Video Technology

Source ：

Pattern Recognition Letters

ISSN： 0167-8655

Year： 2024

Volume： 177

Page： 33-39

5 . 1 0 0

JCR@2022

Cited Count：

WoS CC Cited Count： 0

SCOPUS Cited Count： 1

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 5

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search SCOPUS

Type
Departments

All Years Choose Year From to