Cross-Modality Retrieval by Joint Correlation Learning - Details

Author：

Wang, Shuo (Wang, Shuo.) | Guo, Dan (Guo, Dan.) | Xu, Xin (Xu, Xin.) | Zhuo, Li (Zhuo, Li.) (Scholars：卓力) | Wang, Meng (Wang, Meng.)

Indexed by：

EI Scopus SCIE

Abstract：

As　an　indispensable　process　of　cross-media　analyzing,　comprehending　heterogeneous　data　faces　challenges　in　the　fields　of　visual　question　answering　(VQA),　visual　captioning,　and　cross-modality　retrieval.　Bridging　the　semantic　gap　between　the　two　modalities　is　still　difficult.　In　this　article,　to　address　the　problem　in　cross-modality　retrieval,　we　propose　a　cross-modal　learning　model　with　joint　correlative　calculation　learning.　First,　an　auto-encoder　is　used　to　embed　the　visual　features　by　minimizing　the　error　of　feature　reconstruction　and　a　multi-layer　perceptron　(MLP)　is　utilized　to　model　the　textual　features　embedding.　Then　we　design　a　joint　loss　function　to　optimize　both　the　intra-　and　the　inter-correlations　among　the　image-sentence　pairs,　i.e.,　the　reconstruction　loss　of　visual　features,　the　relevant　similarity　loss　of　paired　samples,　and　the　triplet　relation　loss　between　positive　and　negative　examples.　In　the　proposed　method,　we　optimize　the　joint　loss　based　on　a　batch　score　matrix　and　utilize　all　mutual　mismatched　paired　samples　to　enhance　its　performance.　Our　experiments　in　the　retrieval　tasks　demonstrate　the　effectiveness　of　the　proposed　method.　It　achieves　comparable　performance　to　the　state-of-the-art　on　three　benchmarks,　i.e.,　Flickr8k,　Flickr30k,　and　MS-COCO.

Keyword：

Cross-modality retrieval joint loss auto-encoder MLP

Author Community：

[ 1 ] [Wang, Shuo]Hefei Univ Technol, Sch Artificial Intelligence, Sch Comp Sci & Informat, Hefei 230601, Anhui, Peoples R China
[ 2 ] [Guo, Dan]Hefei Univ Technol, Sch Artificial Intelligence, Sch Comp Sci & Informat, Hefei 230601, Anhui, Peoples R China
[ 3 ] [Wang, Meng]Hefei Univ Technol, Sch Artificial Intelligence, Sch Comp Sci & Informat, Hefei 230601, Anhui, Peoples R China
[ 4 ] [Xu, Xin]Wuhan Univ Sci & Technol, Coll Comp Sci & Technol, Wuhan, Hubei, Peoples R China
[ 5 ] [Zhuo, Li]Beijing Univ Technol, Signal & Informat Proc Lab, Beijing, Peoples R China

Reprint Author's Address：

[Wang, Shuo]Hefei Univ Technol, Sch Artificial Intelligence, Sch Comp Sci & Informat, Hefei 230601, Anhui, Peoples R China;;[Guo, Dan]Hefei Univ Technol, Sch Artificial Intelligence, Sch Comp Sci & Informat, Hefei 230601, Anhui, Peoples R China

Email：

shuowang.hfut@gmail.com |
guodan@hfut.edu.cn |
xuxin0336@163.com |
zhuoli@bjut.edu.cn |
eric.mengwang@gmail.com

Show more details

Related Keywords：

Auto-encoder Based Clustering Algorithms for Intuitionistic Fuzzy Sets
2017，12th International Conference on Intelligent Systems and Knowledge Engineering (IEEE ISKE)
Detecting Abnormal Record in Automotive Maintenance Data Based on Deep Sparse Auto-Encoder
2024，11th International Conference on Electrical and Electronics Engineering, ICEEE 2024
Auto-encoder based clustering algorithms for intuitionistic fuzzy sets
2017，12th International Conference on Intelligent Systems and Knowledge Engineering, ISKE 2017
Mobile robot path planning based on deep auto-encoder and Q-learning
2016，Journal of Beijing University of Technology

Source ：

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS

ISSN： 1551-6857

Year： 2019

Issue： 2

Volume： 15

5 . 1 0 0

JCR@2022

ESI Discipline： COMPUTER SCIENCE;

ESI HC Threshold：147

JCR Journal Grade：1

Cited Count：

WoS CC Cited Count： 16

SCOPUS Cited Count： 16

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 10

Affiliated Colleges：

信息科学技术学院本学院/部未明确归属的数据

Get Fulltext

DOI Library Discovery Baidu Scholar Search Web of Science

Type
Departments

All Years Choose Year From to