Multi-level attention for referring expression comprehension - Details

Author：

Sun, Yanfeng (Sun, Yanfeng.) (Scholars：孙艳丰) | Zhang, Yunru (Zhang, Yunru.) | Jiang, Huajie (Jiang, Huajie.) | Hu, Yongli (Hu, Yongli.) | Yin, Baocai (Yin, Baocai.)

Indexed by：

EI Scopus SCIE

Abstract：

Referring　expression　comprehension　aims　to　locate　a　target　object　in　an　image　described　by　a　referring　expression,　where　extracting　semantic　and　discriminative　visual　information　plays　an　important　role.　Most　existing　methods　either　ignore　attribute　information　or　context　information　in　the　model　learning　procedure,　thus　resulting　in　less　effective　visual　features.　In　this　paper,　we　propose　a　Multi-level　Attention　Network　(MANet)　to　extract　the　target　attribute　information　and　the　surrounding　context　information　simultaneously　for　the　target　object,　where　the　Attribute　Attention　Module　is　designed　to　extract　the　fine-grained　visual　information　related　to　the　referring　expression　and　the　Context　Attention　Module　is　designed　to　merge　the　context　information　of　surroundings　to　learn　more　discriminative　visual　features.　Experiments　on　various　common　benchmark　datasets　show　the　effectiveness　of　our　approach.&　COPY;　2023　Elsevier　B.V.　All　rights　reserved.

Keyword：

Context information Attribute information Multilevel attention

Author Community：

[ 1 ] [Sun, Yanfeng]Beijing Univ Technol, Beijing Inst Artificial Intelligence, Fac Informat Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China
[ 2 ] [Zhang, Yunru]Beijing Univ Technol, Beijing Inst Artificial Intelligence, Fac Informat Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China
[ 3 ] [Jiang, Huajie]Beijing Univ Technol, Beijing Inst Artificial Intelligence, Fac Informat Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China
[ 4 ] [Hu, Yongli]Beijing Univ Technol, Beijing Inst Artificial Intelligence, Fac Informat Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China
[ 5 ] [Yin, Baocai]Beijing Univ Technol, Beijing Inst Artificial Intelligence, Fac Informat Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China

Reprint Author's Address：

Email：

yfsun@bjut.edu.cn |
zhangyunru@emails.bjut.edu.cn |
jianghj@bjut.edu.cn |
huyongli@bjut.edu.cn |
ybc@bjut.edu.cn

Show more details

Related Keywords：

Face detection using simplified Gabor features and hierarchical regions in a cascade of classifiers
2009，PATTERN RECOGNITION LETTERS
Multimodal multilevel attention for semi-supervised skeleton-based gesture recognition
2025，Complex & Intelligent Systems
Locally Weighted Fusion of Structural and Attribute Information in Graph Clustering
2019，IEEE TRANSACTIONS ON CYBERNETICS
A q-Rung Orthopair Cloud-Based Multi-Attribute Decision-Making Algorithm: Considering the Information Error and Multilayer Heterogeneous Relationship of Attributes
2021，IEEE ACCESS

Source ：

PATTERN RECOGNITION LETTERS

ISSN： 0167-8655

Year： 2023

Volume： 172

Page： 252-258

5 . 1 0 0

JCR@2022

ESI Discipline： ENGINEERING;

ESI HC Threshold：19

Cited Count：

WoS CC Cited Count： 2

SCOPUS Cited Count： 3

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 8

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search Web of Science

Type
Departments

All Years Choose Year From to