Indexed by:
Abstract:
The task of image-text retrieval has gained significant attention in the realm of multimodal artificial intelligence. Nonetheless, existing works encounter challenges in efficiently utilizing inter-modal information and adequately leveraging crucial intra-modal details. In this paper, we propose a novel common-Memory Bridged cross-modal Adaptive Graph Embedding (MBAGE) network for image-text retrieval. Initially, we represent images and text as graphs, wherein nodes symbolize salient regions and words. Subsequently, we incorporate a common-memory bank as an intermediate bridge, facilitating interactions between nodes in the two graphs and enabling efficient utilization of inter-modal information. Additionally, we propose an adaptive graph convolutional network to implement intra-modal interaction, which can adaptively suppress the learning of unimportant nodes. Finally, adaptive pooling is employed to retain essential information, yielding a superior holistic embedding. Experimental results on the Flickr30K and MS-COCO datasets demonstrate that the MBAGE network not only achieves compelling retrieval precision but also exhibits high retrieval efficiency. © 2024 IEEE.
Keyword:
Reprint Author's Address:
Email:
Source :
ISSN: 1945-7871
Year: 2024
Language: English
Cited Count:
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 0
Affiliated Colleges: