• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Bi, Y. (Bi, Y..) | Jiang, H. (Jiang, H..) | Hu, Y. (Hu, Y..) | Sun, Y. (Sun, Y..) | Yin, B. (Yin, B..)

Indexed by:

EI Scopus SCIE

Abstract:

As a prevailing cross-modal reasoning task, Visual Question Answering (VQA) has achieved impressive progress in the last few years, where the language bias is widely studied to learn more robust VQA models. However, the visual bias, which also influences the robustness of VQA models, is seldomly considered, resulting in weak inference ability. Therefore, how to balance the effect of language bias and visual bias has become essential in the current VQA task. In this paper, we devise a new reweighting strategy taking both the language bias and visual bias into account, and propose a Fair Attention Network for Robust Visual Question Answering (named as FAN-VQA). It first constructs a question bias branch and a visual bias branch to estimate the bias information from two modalities, which are utilized to judge the importance of samples. Then, adaptive importance weights are learned from the bias information and assigned to the candidate answers to adjust the training losses, enabling the model to shift more attention to the difficult samples that need less-salient visual clues to infer the correct answer. In order to improve the robustness of the VQA model, we design a progressive strategy to balance the influence of original training loss and adjusted training loss. Extensive experiments on the VQA-CP v2, VQA v2, and VQA-CE datasets demonstrate the effectiveness of the proposed FAN-VQA method. IEEE

Keyword:

Visualization Visual question answering language bias reweighting Cognition Adaptation models Training Question answering (information retrieval) Task analysis Estimation visual bias

Author Community:

  • [ 1 ] [Bi Y.]Beijing Institute of Artificial Intelligence, Faculty of Information Technology, Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Beijing University of Technology, Beijing, China
  • [ 2 ] [Jiang H.]Beijing Institute of Artificial Intelligence, Faculty of Information Technology, Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Beijing University of Technology, Beijing, China
  • [ 3 ] [Hu Y.]Beijing Institute of Artificial Intelligence, Faculty of Information Technology, Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Beijing University of Technology, Beijing, China
  • [ 4 ] [Sun Y.]Beijing Institute of Artificial Intelligence, Faculty of Information Technology, Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Beijing University of Technology, Beijing, China
  • [ 5 ] [Yin B.]Beijing Institute of Artificial Intelligence, Faculty of Information Technology, Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Beijing University of Technology, Beijing, China

Reprint Author's Address:

Email:

Show more details

Related Keywords:

Source :

IEEE Transactions on Circuits and Systems for Video Technology

ISSN: 1051-8215

Year: 2024

Issue: 9

Volume: 34

Page: 1-1

8 . 4 0 0

JCR@2022

Cited Count:

WoS CC Cited Count: 0

SCOPUS Cited Count: 5

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 10

Affiliated Colleges:

Online/Total:1466/10612484
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.