Global and Local Interactive Perception Network for Referring Image Segmentation - Details

Author：

Indexed by：

EI Scopus SCIE

Abstract：

The　effective　modal　fusion　and　perception　between　the　language　and　the　image　are　necessary　for　inferring　the　reference　instance　in　the　referring　image　segmentation　(RIS)　task.　In　this　article,　we　propose　a　novel　RIS　network,　the　global　and　local　interactive　perception　network　(GLIPN),　to　enhance　the　quality　of　modal　fusion　between　the　language　and　the　image　from　the　local　and　global　perspectives.　The　core　of　GLIPN　is　the　global　and　local　interactive　perception　(GLIP)　scheme.　Specifically,　the　GLIP　scheme　contains　the　local　perception　module　(LPM)　and　the　global　perception　module　(GPM).　The　LPM　is　designed　to　enhance　the　local　modal　fusion　by　the　correspondence　between　word　and　image　local　semantics.　The　GPM　is　designed　to　inject　the　global　structured　semantics　of　images　into　the　modal　fusion　process,　which　can　better　guide　the　word　embedding　to　perceive　the　whole　image＇s　global　structure.　Combined　with　the　local-global　context　semantics　fusion,　extensive　experiments　on　several　benchmark　datasets　demonstrate　the　advantage　of　the　proposed　GLIPN　over　most　state-of-the-art　approaches.

Keyword：

referring image segmentation (RIS) Visualization transformer Feature extraction Object detection Attention mechanism Semantics global perception local perception Image segmentation Detectors Task analysis

Author Community：

[ 1 ] [Liu, Jing]Beijing Univ Technol, Beijing Inst Artificial Intelligence, Fac Informat Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China
[ 2 ] [Tan, Hongchen]Beijing Univ Technol, Beijing Inst Artificial Intelligence, Fac Informat Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China
[ 3 ] [Hu, Yongli]Beijing Univ Technol, Beijing Inst Artificial Intelligence, Fac Informat Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China
[ 4 ] [Sun, Yanfeng]Beijing Univ Technol, Beijing Inst Artificial Intelligence, Fac Informat Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China
[ 5 ] [Yin, Baocai]Beijing Univ Technol, Beijing Inst Artificial Intelligence, Fac Informat Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China
[ 6 ] [Wang, Huasheng]Cardiff Univ, Sch Comp Sci & Informat, Cardiff CF10 3AT, Wales

Reprint Author's Address：

Email：

Show more details

Related Keywords：

Cascade Transformer Decoder based Occluded Pedestrian Detection with Dynamic Deformable Convolution and Gaussian Projection Channel Attention Mechanism
2023，IEEE Transactions on Multimedia
SwinCGH-Net: Enhancing Robustness of Object Detection in Autonomous Driving with Weather Noise via Attention
2023，19th International Conference on Intelligent Computing, ICIC 2023
Cascaded Segmented Matting Network for Human Matting
2021，IEEE ACCESS
Multi-scale Feature Fusion UAV Image Object Detection Method Based on Dilated Convolution and Attention Mechanism
2020，8th International Conference on Information Technology: IoT and Smart City, ICIT 2020

Source ：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

ISSN： 2162-237X

Year： 2023

1 0 . 4 0 0

JCR@2022

ESI Discipline： COMPUTER SCIENCE;

ESI HC Threshold：19

Cited Count：

WoS CC Cited Count： 0

SCOPUS Cited Count： 1

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 7

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search Web of Science

Type
Departments

All Years Choose Year From to