Fair Attention Network for Robust Visual Question Answering - Details

Author：

Bi, Y. (Bi, Y..) | Jiang, H. (Jiang, H..) | Hu, Y. (Hu, Y..) | Sun, Y. (Sun, Y..) | Yin, B. (Yin, B..)

Indexed by：

EI Scopus SCIE

Abstract：

As　a　prevailing　cross-modal　reasoning　task,　Visual　Question　Answering　(VQA)　has　achieved　impressive　progress　in　the　last　few　years,　where　the　language　bias　is　widely　studied　to　learn　more　robust　VQA　models.　However,　the　visual　bias,　which　also　influences　the　robustness　of　VQA　models,　is　seldomly　considered,　resulting　in　weak　inference　ability.　Therefore,　how　to　balance　the　effect　of　language　bias　and　visual　bias　has　become　essential　in　the　current　VQA　task.　In　this　paper,　we　devise　a　new　reweighting　strategy　taking　both　the　language　bias　and　visual　bias　into　account,　and　propose　a　Fair　Attention　Network　for　Robust　Visual　Question　Answering　(named　as　FAN-VQA).　It　first　constructs　a　question　bias　branch　and　a　visual　bias　branch　to　estimate　the　bias　information　from　two　modalities,　which　are　utilized　to　judge　the　importance　of　samples.　Then,　adaptive　importance　weights　are　learned　from　the　bias　information　and　assigned　to　the　candidate　answers　to　adjust　the　training　losses,　enabling　the　model　to　shift　more　attention　to　the　difficult　samples　that　need　less-salient　visual　clues　to　infer　the　correct　answer.　In　order　to　improve　the　robustness　of　the　VQA　model,　we　design　a　progressive　strategy　to　balance　the　influence　of　original　training　loss　and　adjusted　training　loss.　Extensive　experiments　on　the　VQA-CP　v2,　VQA　v2,　and　VQA-CE　datasets　demonstrate　the　effectiveness　of　the　proposed　FAN-VQA　method.　IEEE

Keyword：

Visualization Visual question answering language bias reweighting Cognition Adaptation models Training Question answering (information retrieval) Task analysis Estimation visual bias

Author Community：

[ 1 ] [Bi Y.]Beijing Institute of Artificial Intelligence, Faculty of Information Technology, Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Beijing University of Technology, Beijing, China
[ 2 ] [Jiang H.]Beijing Institute of Artificial Intelligence, Faculty of Information Technology, Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Beijing University of Technology, Beijing, China
[ 3 ] [Hu Y.]Beijing Institute of Artificial Intelligence, Faculty of Information Technology, Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Beijing University of Technology, Beijing, China
[ 4 ] [Sun Y.]Beijing Institute of Artificial Intelligence, Faculty of Information Technology, Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Beijing University of Technology, Beijing, China
[ 5 ] [Yin B.]Beijing Institute of Artificial Intelligence, Faculty of Information Technology, Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Beijing University of Technology, Beijing, China

Reprint Author's Address：

Email：

Show more details

Related Keywords：

See and Learn More: Dense caption-aware Representation for Visual Question Answering
2023，IEEE Transactions on Circuits and Systems for Video Technology
DHHG-TAC: Fusion of Dynamic Heterogeneous Hypergraphs and Transformer Attention Mechanism for Visual Question Answering Tasks
2024，IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS
CFMMC-Align: Coarse-Fine Multi-Modal Contrastive Alignment Network for Traffic Event Video Question Answering
2024，IEEE Transactions on Circuits and Systems for Video Technology
Perception-and-Cognition-Inspired Quality Assessment for Sonar Image Super-Resolution
2024，IEEE Transactions on Multimedia

Source ：

IEEE Transactions on Circuits and Systems for Video Technology

ISSN： 1051-8215

Year： 2024

Issue： 9

Volume： 34

Page： 1-1

8 . 4 0 0

JCR@2022

Cited Count：

WoS CC Cited Count： 0

SCOPUS Cited Count： 5

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 10

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search SCOPUS

Type
Departments

All Years Choose Year From to