Indexed by:
Abstract:
Referring expression comprehension aims to locate a target object in an image described by a referring expression, where extracting semantic and discriminative visual information plays an important role. Most existing methods either ignore attribute information or context information in the model learning procedure, thus resulting in less effective visual features. In this paper, we propose a Multi-level Attention Network (MANet) to extract the target attribute information and the surrounding context information simultaneously for the target object, where the Attribute Attention Module is designed to extract the fine-grained visual information related to the referring expression and the Context Attention Module is designed to merge the context information of surroundings to learn more discriminative visual features. Experiments on various common benchmark datasets show the effectiveness of our approach.& COPY; 2023 Elsevier B.V. All rights reserved.
Keyword:
Reprint Author's Address:
Email:
Source :
PATTERN RECOGNITION LETTERS
ISSN: 0167-8655
Year: 2023
Volume: 172
Page: 252-258
5 . 1 0 0
JCR@2022
ESI Discipline: ENGINEERING;
ESI HC Threshold:19
Cited Count:
WoS CC Cited Count: 2
SCOPUS Cited Count: 3
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 8
Affiliated Colleges: