Multi-Granularity Feature Interaction and Multi-Region Selection Based Triplet Visual Question Answering - Details

Author：

Indexed by：

EI Scopus SCIE

Abstract：

Accurately　locating　the　question-related　regions　in　one　given　image　is　crucial　for　visual　question　answering　(VQA).　The　current　approaches　suffer　two　limitations:　(1)　Dividing　one　image　into　multiple　regions　may　lose　parts　of　semantic　information　and　original　relationships　between　regions;　(2)　Choosing　only　one　or　all　image　regions　to　predict　the　answer　may　correspondingly　result　in　the　insufficiency　or　redundancy　of　information.　Therefore,　how　to　effectively　mine　the　relationship　between　image　regions　and　choose　the　relevant　image　regions　are　vital.　In　this　paper,　we　propose　a　novel　Multi-granularity　feature　interaction　and　Multi-region　selection-based　triplet　VQA　model　(M2TVQA).　To　tackle　the　first　limitation,　we　propose　the　multi-granularity　feature　interaction　strategy　that　adaptively　supplements　the　global　coarse-granularity　features　with　the　regional　fine-granularity　features.　To　overcome　the　second　limitation,　we　design　the　Top-K　learning　strategy　to　adaptively　select　K　most　relevant　image　regions　to　the　question,　even　if　the　selected　regions　are　far　away　in　space.　Such　a　strategy　can　select　as　many　relevant　image　regions　as　possible　and　reduce　introducing　noise.　Finally,　we　construct　the　multi-modality　triplet　to　predict　the　answer　of　VQA.　Extended　experiments　on　two　public　outside　knowledge　datasets　OK-VQA　and　KRVQA　verify　the　effectiveness　of　the　proposed　model.

Keyword：

cross-modality analysis Visual question answering multi-granularity features

Author Community：

[ 1 ] [Liu, Heng]Beijing Univ Technol, Beijing Inst Artificial Intelligence, Sch Informat Sci & Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China
[ 2 ] [Wang, Boyue]Beijing Univ Technol, Beijing Inst Artificial Intelligence, Sch Informat Sci & Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China
[ 3 ] [Sun, Yanfeng]Beijing Univ Technol, Beijing Inst Artificial Intelligence, Sch Informat Sci & Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China
[ 4 ] [Li, Xiaoyan]Beijing Univ Technol, Beijing Inst Artificial Intelligence, Sch Informat Sci & Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China
[ 5 ] [Hu, Yongli]Beijing Univ Technol, Beijing Inst Artificial Intelligence, Sch Informat Sci & Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China
[ 6 ] [Yin, Baocai]Beijing Univ Technol, Beijing Inst Artificial Intelligence, Sch Informat Sci & Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China
[ 7 ] [Gao, Junbin]Univ Sydney, Business Sch, Discipline Business Analyt, Camperdown, NSW, Australia

Reprint Author's Address：

[Wang, Boyue]Beijing Univ Technol, Beijing Inst Artificial Intelligence, Sch Informat Sci & Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China

Email：

Show more details

Related Keywords：

Bridging the Cross-Modality Semantic Gap in Visual Question Answering
2024，IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
VIG: Visual Information-Guided Knowledge-Based Visual Question Answering
2024，PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024
Fair Attention Network for Robust Visual Question Answering
2024，IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY
See and Learn More: Dense Caption-Aware Representation for Visual Question Answering
2024，IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY

Source ：

IEEE TRANSACTIONS ON BIG DATA

ISSN： 2332-7790

Year： 2025

Issue： 3

Volume： 11

Page： 1346-1356

7 . 2 0 0

JCR@2022

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count：

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 0

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search Web of Science

Type
Departments

All Years Choose Year From to